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Foreword 


During the second week of July, the ICIAM 2019 Congress took place in Valencia 
with almost 4,000 participants, with 50 plenary talks, more than 300 mini-symposia, 
550 contributed talks and 250 posters. A wide representation of world applied math- 
ematics met in Valencia to present and discuss how mathematics was applied to the 
most diverse disciplines, such as applied mathematics for industry and engineering, 
biology, medicine and other natural sciences, control and systems theory, dynamical 
systems and nonlinear analysis, finance and management science, industrial math- 
ematics, mathematics and computer science, numerical analysis, partial differential 
equations and simulation and modeling, to name some of them. 

Within the organizing committee, the idea arose that these presentations and 
discussions should be reflected in some way for the future. And the offer from 
Springer came up to launch a series of volumes that would record the most notable 
advances that took place in it. 

This offer crystallized in the /CIAM 2019 SEMA SIMAI Springer Series, which 
includes the present volume, dedicated to the conferences of the invited speakers, 
which occupies a very central and special place, since it is offered in open access 
mode, thanks to the support of Sociedad Española de Matemática Aplicada (SeMA). 

The selection of the 336 mini symposia of the ICIAM 2019 was made by its 
academic committee. In a very direct relationship with it, the editorial committee 
of this series was formed by F. Arándiga Llaudes, M. Gómez Marmol, F. Guillén- 
González, F. Ortegón Gallego, C. Parés, P. Quintela, C. Vázquez-Cendón, S. Xambó- 
Descamps and myself. The members of this committee were in charge of selecting 
the proposals, many of them derived from mini-congress symposia, and also to act 
as the editors in charge for some of the 14 volumes that make up this series: 


1. Recent Advances in Industrial and Applied Mathematics, edited by Tomás 
Chacón Rebollo, Rosa Donat and Inmaculada Higueras. 

2. Stabilization of Distributed Parameter Systems: Design Methods and Appli- 
cations, edited by Grigory Sklyar and Alexander Zuyev. 

3. Cartesian CFD Methods for Complex Applications, edited by Ralf Deiterding, 
Margarete Oliveira and Kai Schneider. 


vi Foreword 


4. Applications of Wavelet Multiresolution Analysis, edited by Juan Pablo 
Muszkats, Silvia Alejandra Seminara and Maria Inés Troparevsky. 

5. Progress in Industrial Mathematics: Success Stories, edited by Manuel Cruz, 
Carlos Parés and Peregrina Quintela. 

6. Applied Mathematics for Environmental Problems, edited by María Isabel 
Asensio, Albert Oliver and José Sarrate. 

7. Improving Applied Mathematics Education, edited by Ron Buckmire and 
Jessica M. Libertini. 

8. Fractals in Engineering: Theoretical Aspects and Numerical Approximations, 
edited by Maria Rosaria Lancia and Anna Rozanova-Pierrat. 

9. Recent Advances in Differential Equations and Control Theory, edited by 
Concepción Muriel and Carmen Pérez-Martinez. 

10. Emerging Problems in the Homogenization of Partial Differential Equations, 
edited by Patrizia Donato and Manuel Luna-Laynez. 

11. Multidisciplinary Mathematical Modeling, edited by Francesc Font and Tim 
Myers. 

12. Mathematical Descriptions of Traffic Flow: Micro, Macro and Kinetic Models, 
edited by Gabriella Puppo and Andrea Tosin. 

13. Systems, Patterns and Data Engineering with Geometric Calculi, edited by 
Sebastia Xambó-Descamps. 

14. Modeling, Simulation and Optimization in the Health and Energy Sector, 
edited by Rene Pinnau, Nicolas R. Gauger and Axel Klar. 


As can be easily seen, the application of mathematics spreads through the most 
diverse areas, such as industry, health and energy, engineering data science, environ- 
mental problems, geometric calculi, numerical approximation, traffic flow, education, 
etc. 

Now is the time for the reader to delve into the volumes of this series and learn, 
reflect, incorporate new ideas and generally enjoy their content, hoping that the 
volumes of this series can serve as a reference for even more innovative applications 
of mathematics in the future. 

Finally, 1t is time of acknowledgements. Starting with the ICIAM 2019 Congress, 
especially its executive committee led by Tomás Chacón and Rosa Donat as living 
forces of the event, as well as the scientific committee led by Alfio Quarterioni and 
the multiple organizers of mini-symposia, speakers and attendees. Continuing with 
Francesca Bonadei as the promotor within Springer of the need for the existence of 
this series, and with the members of the editorial board of this series, and ending 
with the editors in charge and authors of each volume, which with its excellent work, 
are the real creators of the message of this series. 


Barcelona, Spain Amadeu Delshams 


Preface 


The papers appearing in this volume are authored by some of the invited speakers 
of the 9th International Congress of Industrial and Applied Mathematics, held 
in Valéncia from July 15 to 19, 2019. This volume is part of a series dedicated to 
ICIAM 2019-Valencia. 

The congress, hosted by the Spanish Society for Applied Mathematics (SeMA), 
was organized at the Universitat de Valéncia (Spain), on behalf of the International 
Council for Industrial and Applied Mathematics (ICIAM). With 3983 participants 
from 99 different countries, more than 3400 lectures delivered and nearly 250 poster 
presentations, ICIAM 2019 has been a great success. These data represent a net 
increase in participation, with respect to an already rising trend in previous editions 
of this series of events, which can be considered a sound proof of the growing interest 
of the applied and industrial mathematics community in ICIAM congresses. 

The industrial aspect of the congress was further enriched by organizing a specific 
mathematical technology transfer oriented activity: ‘The Industry Day’. Fourteen 
speakers, selected from a broad representation of different sectors, presented the 
results of ongoing collaborations with academy and the benefits derived from it, 
such as better products and services, optimization of processes, organization and 
accounting, and growth and innovation. In addition, 19 industrial mini-symposia 
were scheduled during the congress, and 48 ‘industry-related’ posters were on display 
during ‘The Industry Day.’ 

Thirty-five satellite events took place during 2018 and 2019 covering a broad 
range of topics within industrial and applied mathematics. These events included 
two CIMPA schools (Kenitra, Morocco and Tunis, Tunisia, 2019), devoted to initiate 
young students from developing countries into research. Also, several Spanish 
towns/regions were appointed sub-venues of ICIAM-2019- Valencia (Bilbao, Galicia, 
Malaga, Seville and Zaragoza) and, as such, organized 12 satellite events. We are 
deeply thankful to the organizers of all satellite events. 

The preparation of the candidacy in 2012 started the long process involved in the 
planning of this complex event. Our deepest gratitude and heartiest thanks go to all 
the people who helped with their abilities to create ICIAM 2019-Valencia. A list of 
all the committees and people involved in this task is given in this book. 


viii Preface 


The congress could not have been possible without the support of a large set of 
sponsors. A special mention is due to our main sponsors: Banco Santander, who 
financed over 70% of the Grant Program of the congress, and the Universitat de 
Valéncia, for its generous offer to make available their facilities to hold the confer- 
ence. Thanks are also due to the Spanish universities that contributed to fund over 
2096 of the Grant Program and to the individual donors who contributed to the 
remaining 5%. 

A thankful recognition is also due to our four institutional sponsors: Ministry of 
Science, Innovation and Universities, Generalitat Valenciana, Diputació de Valéncia 
and Ajuntament de Valéncia. 

On behalf of ICIAM 2019, we would like to express our most sincere gratitude to 
the invited speakers that have contributed to this volume for taking the time to provide 
their valuable contributions, helping us to make this the reference publication of the 
congress. 


Sevilla, Spain Tomás Chacón Rebollo 
Valencia, Spain Rosa Donat 
Pamplona, Spain Inmaculada Higueras 


ICIAM Congresses 


1987 
1991 
1995 
1999 
2003 
2007 
2011 
2015 


Paris 
Washington, D.C. 
Hamburg 
Edinburgh 
Sydney 

Zurich 
Vancouver 
Beijing 


X ICIAM Congresses 


2019-Valencia 


Traditional valencian dances and Muixeranga (human towers) 


ICIAM Congresses E 


The Onsager Conjecture Deterministic version of 


> Komogorov law 
national ‘Any weak solution which belongs to the space C* with n > ] conserves 


E mathematical proofs were given in (1994) by Eyik then 
trial an 


Applied 


Plenary talk 


Closing ceremony and transfer of ICIAM flag 


ICIAM Prize Winners 


ICIAM Collatz Prize 

1999 Stefan Müller, Max Planck Institut für Mathematik Leipzig, Germany. 

2003  Weinan E, Princeton University, USA. 

2007 Felix Otto, Universität Bonn, Germany. 

2011 Emmanuel J. Candés, Stanford University and CALTECH, USA 

2015 Annalisa Buffa, CNR-IMATI, Italy. 

2019  Siddharta Mishra, ETH Zürich, Switzerland. 

ICIAM Lagrange Prize 

1999  Jacques-L. Lions, Collège de France and Académie des Sciences de Paris, 
France. 

2003 Enrico Magenes, Università di Pavia, Italy. 

2007 Joseph Keller, Stanford University, USA. 

2011 Alexandre J. Chorin, U.C. Berkeley and LBNL, USA. 

2015 Andrew J. Majda, New York University, USA. 

2019 George Papanicolaou, Stanford University, USA. 

ICIAM Maxwell Prize 

1999 Grigory I. Barenblatt, UC Berkeley, USA, and University of Cambridge, UK. 

2003 Martin D. Kruskal, Rutgers University, USA. 

2007 Peter Deuflhard, Zuse Institute Berlin, Germany. 

2011 Vladimir Rokhlin, Yale University, USA. 


xiii 


xiv 


ICIAM Prize Winners 


2015 Jean-Michel Coron, Université Pierre et Marie Curie, France. 
2019 Claude Bardos, Université Paris Diderot (Paris VIT), France. 
ICIAM Pioneer Prize 
1999 Ronald R. Coifman, Yale University, USA, 
Helmut Neunzert, University of Kaiserslautern, Germany. 
2003 Stanley Osher, University of California, Los Angeles, USA. 
2007 Ingrid Daubechies, Princeton University, USA, 
Heinz Engl, Johannes Kepler University, and Austrian Academy of Sciences, 
Austria. 
2011 James Albert Sethian, UC Berkeley and LBNL, USA. 
2015  Bjórn Engquist, The University of Texas at Austin, USA. 
2019 Yvon Maday, Sorbonne University, Paris, France. 
ICIAM Su Buchin Prize 
2007 Gilbert Strang, Massachusetts Institute of Technology, USA. 
2011 Edward Lungu, University of Botswana, Botswana. 
2015 Li Tatsien, Fudan University, P.R. China. 
2019 Giulia Di Nunno, University of Oslo, Norway. 


ICIAM Prize Winners xv 


July 15-19 
Valencia - Spain 


2019 ICIAM Prize Ceremony. From left to right: Joan Ribó (Major of Valencia), Ximo Puig (Pres- 
ident of the Generalitat Valenciana), Y. Maday, G. Di Nunno, His Majesty Felipe VI, G. Papani- 
colaou, C. Bardos, S. Mishra, M. Esteban and Pedro Duque (Minister for Science, Innovation and 
Universities) 


Organization of ICIAM 2019-Valencia 


Congress Director: Tomas Chacon, University of Seville, Spain 


Honorary Committee 


President: His Majesty King Felipe VI of Spain 
Members 


Mr. Pedro Duque, Minister for Science, Innovation and Universities, Spain 

Mr. Ximo Puig, President of Generalitat Valenciana, Spain 

Mr. Vicent Marzà, Conseller d'Educació, Investigació, Cultura i Sport of Generalitat 
Valenciana, Spain 

Prof. Josefina Bueno, Directora General d’ Universitats of Generalitat Valenciana, 
Spain 

Prof. Julio Abalde, Rector, University of A Coruña, Spain 

Prof. José Ángel Narváez, Rector, University of Malaga, Spain 

Prof. Nekane Balluerca, Rector, University of the Basque Country, Spain 

Prof. José Mora, Rector, Universitat Politécnica de Valéncia, Spain 

Prof. Antonio López, Rector, University of Santiago de Compostela, Spain 

Prof. Miguel Ángel Castro, Rector, University of Seville, Spain 

Prof. M. Vicenta Mestre, Rector, Universitat de Valéncia, Spain 

Prof. Manuel Joaquín, Reigosa, Rector, Universidade de Vigo, Spain 

Prof. José Antonio Mayoral, Rector, University of Zaragoza, Spain 

Mrs. Ana Botín, President of Banco Santander 


xvii 


xviii Organization of ICIAM 2019- Valencia 


Scientific Program Committee 


Chair 
Alfio Quarteroni, EPFL, Lausanne, Switzerland, and Politecnico di Milano, Italy 
Members 


Tony F. Chan, Hong Kong, China 

Manuel Doblaré Castellano, Seville, Spain 
Qiang Du, New York, USA 

Enrique Fernández Cara, Seville, Spain 
Irene Fonseca, Pittsburgh, USA 

Irene Gamba, Austin, USA 

Markus Hegland, Canberra, Australia 

Ilse Ipsen, Raleigh, USA 

Ravi Kannan, Bangalore, India 

Claudia Kluppelberg, Munich, Germany 
Karl Kunisch, Graz, Austria 

Yasumasa Nishiura, Sendai, Japan 

Benoit Perthame, Paris, France 

Daya Reddy, Rondebosch, South Africa 
Claudia Sagastizabal, Rio de Janeiro, Brazil 
Jeffrey Saltzman, Waltham, USA 

Wil Schilders, Eindhoven, Netherlands 
Endre Suli, Oxford, UK 

Eric Vanden Eijnden, New York, USA 
Pingwen Zhang, Beijing, China 


Executive Committee 


Chair 
Tomás Chacón (US) 
Co-Chairs 


Rosa Donat (UV) 
Luis Vega (UPV/EHU) 


Members 


María Paz Calvo (UVA) 
Eduardo Casas (UC) 
Amadeu Delshams (UPC) 
Henar Herrero (UCLM) 


Organization of ICIAM 2019-Valencia Xix 


Inmaculada Higueras (UPNA) 
Juan Ignacio Montijano (UNIZAR) 
Peregrina Quintela (USDC) 

Carlos Vázquez-Cendón (UDC) 
Elena Vázquez-Cendón (USC) 


Thematic Committees 


Academic 
Chair: Amadeu Delshams (UPC) 


Lino Álvarez-Vázquez (UVIGO) 
Rafael Bru (UV) 

Fernando Casas (UJI) 

Eduardo Casas (UC) 

Enrique Fernández-Nieto (US) 
Javier de Frutos (UVA) 

Dolores Gómez-Pedreira (USC) 
Jesás López-Fidalgo (UNAV) 
Pep Mulet (UV) 

Francisco Ortegón-Gallego (UCA) 
Francisco Padial (UPM) 

Carlos Vázquez-Cendón (UDC) 


Finance 
Chair: Eduardo Casas (UC) 


Antonio Baeza (UV) 

Luis Alberto Fernández (UC) 
Julio Moro (UC3M) 

Carlos Vázquez-Cendón (UDC) 


Fundraising 


Carlos Vázquez-Cendón (UDC) 
Jesás Sanz-Serna (UC3M) 


Industrial Advisory 
Chair: Peregrina Quintela (USC) 


Emilio Carrizosa (US) 

David Pardo (UPV/EHU) 

Antonio Huerta (UPC) 

Carlos Parés (UMA) 

Wenceslao González-Manteiga (USC) 


XX Organization of ICIAM 2019-Valencia 


Communication and Outreach 
Chair: Henar Herrero (UCLM) 


Sergio Blanes (UPV) 

Fernando Casas (UJI) 

Bartomeu Coll (UIB) 

Inmaculada Higueras (UPNA) 
Juan Ignacio Montijano (UNIZAR) 
Alfred Peris (UPV) 

Francisco Ortegón-Gallego (UCA) 
Francisco Pla (UCLM) 

Joan Solá-Morales (UPC) 

Sebastia Xambó-Descamps (UPC) 


Publications and Promotions 
Chair: Inmaculada Higueras (UPNA) 


Rafael Bru (UPV) 

María Paz Calvo (UVA) 

Domingo Hernández-Abreu (ULL) 
Henar Herrero (UCLM) 

Mariano Mateos (UNIOVI) 

Julio Moro (UC3M) 


Satellite and Embedded Meetings 
Chair: Maria Paz Calvo (UVA) 


Francisco Guillén-González (US) 
Carlos Parés (UMA) 

Luis Rández (UNIZAR) 

Carlos Vázquez-Cendón (UDC) 
Luis Vega (UPV/EHU) 


Travel Support Committee 
Chair: Elena Vázquez-Cendón (USC) 


Macarena Gómez-Mármol (US) 

José Manuel González-Vida (UMA) 
Pep Mulet (UV) 

Francisco Javier Sayas (U. of Delaware) 
Rodrigo Trujillo-González (ULL) 


Local Arrangements 
Chair: Rosa Donat (UV) 


José María Amigó (UMH) 
Francesc Arándiga (UV) 


Organization of ICIAM 2019-Valencia 


Ana Maria Arnal (UJI) 
Antonio Baeza (UV) 

Sergio Blanes (UPV) 

Rafael Bru (UPV) 

Fernando Casas (UJI) 
Cristina Chiralt (UJD 

Rafael Cantó (UPV) 

José Alberto Conejero (UPV) 
Isabel Cordero-Carrión (UV) 
Cristina Corral (UPV) 

Juan Carlos Cortés (UPV) 
María Teresa Gassó (UPV) 
Olga Gil-Medrano (UV) 
Alicia Herrero (UPV) 

Leila Lebtahi (UV) 

María del Carmen Martí (UV) 
Vicente Martínez (UJI) 

José Mas (UPV) 

José Salvador Moll (UV) 
Francisco Gabriel Morillas-Jurado (UV) 
Pep Mulet (UV) 

Mari Carmen Perea (UMH) 
Rosa Peris (UV) 

Alfred Peris (UPV) 

Sergio Segura de León (UV) 
Ana María Urbano (UPV) 
Pura Vindel (UJI) 


Acronyms of Spanish Universities 


UC3M Carlos III University, Madrid 
UDC University of A Coruña 

UA University of Alicante 

UCA University of Cadiz 

UC University of Cantabria 

UCLM University of Castilla La Mancha 
ULL University of La Laguna 

UMA University of Malaga 

UNAV University of Navarra 

UNIOVI University of Oviedo 

USC University of Santiago de Compostela 
US University of Seville 

UVA University of Valladolid 


xxii Organization of ICIAM 2019-Valencia 


UVIGO University of Vigo 

UNIZAR University of Zaragoza 

UPV/EHU University of the Basque Country 
UMH Miguel Hernández University 
UPM Technical University of Madrid 
UPNA Public University of Navarra 

UIB University of the Balearic Islands 
UV Universitat de Valéncia 

UJI Universitat Jaume I 

UPC Universitat Politécnica de Catalunya 
UPV Universitat Politécnica de Valéncia 


Collaborators at the Universitat de Valencia 


M. Vicenta Mestre, Rector 
Rector’s Cabinet 


Justo Herrera, Vice-rector 
Juan Vte. Climent, Manager 
Beatriz Gómez, Vice-manager 
José Ramírez, Vice-manager 
Joan Enric Úbeda, Director 
Carmen Fayos, Head of Staff 


Facultat de Psicologia 


M. Dolores Sancerni, Dean 
Juan M. Rausell, Administrator 
Juan J. Cancio, Coordinator 
Concierges of the building 


Facultat de Filosofía i Ciències de 1" Educació 


Rosa M. Bo, Dean 

Francisco J. Moreno, Administrator 
Esther Bolinches, Coordinator 
Concierges of the Building 


Health, Safety and the Environment Service 


M. José Vidal, Head of Staff 
Miguel A. Toledo, Technician 
Verónica Saiz, Technician 
Vicente Caballer, Technician 


Computer Service 


Fuensanta Doménech, Head of Staff 


Organization of ICIAM 2019-Valencia 


Faustino Fernandez, IT Infrastructure 
Magdalena Ros, Quality Control 


Blasco Ibáñez Campus Management Unit 


Carmen Tejedo, Administrator 

Dolores Cano, Head of Staff 

M. Angeles Llorens, Head of Staff 
Inmaculada Yuste, Administrative 

M. José Ballester, Services Coordinator 
Maria Luisa Jordan, Concierge 
Concierges of Aularios I, III y VI 


Facultat de Medicina i Odontologia 


Francisco J. Chorro, Dean 

M. Vicenta Alandi, Administrator. 
Guillermo Pérez, Coordinator 
Concierges of the Building 


Facultat de Filologia, Traducció i Comunicació 


Amparo Ricós, Dean 

Francisca Sánchez, Administrator 
Josep M. Valldecabres, Coordinator 
Concierges of the Building 


Facultat de Geografia i Historia 


Josep Montesinos, Dean 

Joaquín V. Lacasta, Administrator 
Josep Vicó, Coordinator 
Concierges of the Building 


UVSports Service 


Vicent Afió, Director 

M. Paz Molina, Administrator 
Francisco Vicent, Coordinator 
Francisco Barceló, Concierge 


Technical and Maintenance Service 


Rosa M. Mochales, Head 

Rafael Antón, Technician 

M. Dolores Yagüe, Technician 
Jorge Vila, Technician 

Ramón Doménech, Maintenance 
Modesto Ramírez, Maintenance 


xxiii 


xxiv Organization of ICIAM 2019- Valencia 


Carles Aguado, Maintenance 
Diego Cantero, Maintenance 


UVdisability Service 


M. Celeste Asensi, Director 
Restituto Vafio, Accessibility 


Technical Unit 


Luis Juaristi, Head of Staff 
Vicente Tarazona, Technician 
José M. Zapata, Technician 


Collaborators at the University of Seville 


Teresa Ayuga 


Staff from External Partners 


Jesús Ibáñez, Security director UV 

Luis Briz and Security Staff from Clece Security 

Concierges from UTE Blasco Ibáñez 

Staff from Grupo Fissa, Cleaning Company. Amparo Cuadrado, Coordinator 
Antonio Gonzalbo and Maintenance Staff from Ferrovial 

Alexandre Andrés and Carlos J. Soler from Valnu 

Ana M* Gómez and Gardening Staff from Special Employment Center IVASS 


Opening Ceremony 


Tomas Chacon Rebollo, Congress Director 


Industrial and Appliec 
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Your Majesty, President of the Region of Valencia, Minister of Science, Innova- 
tion and Universities, Major of Valencia, President of ICIAM, respected guests and 
delegates, on behalf of the Spanish Society for Applied Mathematics and the orga- 
nizing committee, it is for me a pleasure to convey you our warmest welcome to 
ICIAM-2019-Valencia Congress. 


xxvi Opening Ceremony 


Mathematics is silently shaping the present technological world. It provides a deep 
insight in numberless processes and systems, thereby advancing scientific knowl- 
edge. It also generates added value in virtually all economic sectors. On top of that, 
the last years have witnessed a change in paradigm, as mathematics directly provide 
the technological basis of emerging sectors related with data analysis. 

The research and transfer in mathematics have experienced a fast development 
in Spain; besides all sciences, since the last decades of the twentieth century, Spain 
occupies today the 7th world position in mathematical research by citations. The 
mathematics play a relevant role in the Spanish economy; in fact, 1096 of the national 
gross income and 6% of the employment are directly due to its use in the economic 
activity. 

ICIAM 2019 Congress features 27 invited talks, the 5 ICIAM prices, the Olga 
Taussky-Todd Lecture and the Public Lecture. It will count on nearly 2000 talks as 
well as 250 posters. It also includes three special panels of great interest to under- 
stand the social framework in which our job as mathematicians takes place. This is 
industry talking about mathematics, instead of mathematicians talking about their 
collaborations with industry. ICIAM 2019 also includes an Industry Day, where 14 
technological companies have agreed to present how mathematics helps to improve 
their production processes. 

Thanks to four different funding programs, we have been able to offer over 230 
scholarships to young researchers as well as to researchers coming from developing 
countries. In addition, we have implemented a volunteers program with over 170 
young students that will greatly help the organization of the congress. 

All this has been possible thanks to the collaborative work of the scientific 
panel committee, chaired by Prof. Alfio Quarteroni, and an enthusiastic organizing 
committee. I convey my deepest thanks to all of them. Special thanks are addressed 
to the Spanish Society for Applied Mathematics, and its president, Prof. Rosa Donat, 
who also chairs the local organizing committee. Let me also acknowledge the role 
of our families, for their support all along the organization of the congress. 

We are indebted to ICIAM for trusting us to organize this congress and especially 
to her past and present presidents, Profs. Barbara Keyfitz and Maria Esteban, for their 
help and advice in the organization process. We also address our deepest thanks to the 
many organizations that have sponsored the congress: the Spanish Government, the 
Region of Valencia, the Diputació de Valéncia, the City Council and the University 
of Valencia, Spanish centers, departments and institutes of mathematics, Springer 
Publishing House, Santander Bank and the many individual donors. We are also 
indebted to SIAM for embedding their annual meeting in this ICIAM Congress and 
also to all you for organizing and participating in the many activities that take place 
within it. 

You find yourself at the perfect time and place to learn about new mathematical 
tools, exchange ideas and move ahead in the thrilling challenge of shaping the world 
with mathematics. 

Welcome to ICIAM 2019-Valencia Congress!! 


Maria J. Esteban, President of ICIAM 
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His Majesty the King, President of the Generalitat of Valencia, Major of Valencia, 
Minister of Science, Innovation and Universities, Congress Director, ladies and 
gentlemen, dear colleagues, 

It is my great honor and pleasure, to welcome you all to ICIAM 2019, the ninth 
International Congress on Industrial and Applied Mathematics. 

The ICIAM congresses are the main event organized by our international organi- 
zation, a network of more than 50 learned societies. The global ICIAM community 
covers many countries and all topics that are related to the applications of mathe- 
matics to the real world, to industry, to health, to economy, to climate, to artificial 
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intelligence and so on. Mathematics is unavoidable in the development of new tech- 
nologies and in the advancement of our societies. As the recent report on the impact 
of mathematics on the Spanish economy shows, investing in mathematics is a very 
good idea, because the economic returns are high. This was also apparent in similar 
impact studies carried out previously in the UK, the Netherlands and France. 

This congress is the occasion when worldwide applied and industrial mathemati- 
cians show to each other what they have done in the past years and what they plan 
to do next. During these days, we will prepare the future. 

Spain was chosen six years ago to organize this big congress, the main event in 
our community, taking place only every four years. In 2015, we were in Beijing, 
and in 2023, we will be in Tokyo. Here today in the beautiful city of Valencia, 
we host more than 4000 mathematicians from all over the world, junior, senior, 
students, professors, researchers and engineers. During these six years, our Spanish 
colleagues have worked nonstop to make this congress a big success. In the name 
of the whole ICIAM community, let me thank the organizers for their huge effort. 
Thank you very much to the Spanish Society of Applied Mathematics (SEMA) and 
to the whole Spanish applied mathematics community. Thanks also to all official 
Spanish institutions that have offered their support. 

And now, to all of you who are eager to see how the congress will develop, I 
wish you a productive week. Just be patient and courageous, because the program 
of the congress is very heavy, but this is the only way to show the whole span of our 
community's work in only five days. I thank you all for being here, and I wish you 
a great congress and a very pleasant week! 
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Asteroid-Generated Tsunamis: A) 
A Review Check for 


updates 


Marsha Berger 


Abstract We study ocean waves caused by an asteroid airburst located over the 
ocean. The concern is that the waves would damage distant coastal cities. Simple 
qualitative analysis suggests that the wave energy is proportional to the ocean depth 
and the strength and speed of the blast. Computational simulations using GeoClaw 
and the shallow water equations show that explosions from realistic asteroids do not 
endanger distant cities. We explore the validity of the shallow water, Boussinesq, 
and linearized Euler equations to model these water waves. 


1 Introduction 


This talk will review some of the basics behind the simulation of asteroid-generated 
tsunamis, and how this piece of the Asteroid Threat Assessment Program (ATAP) 
got its start. 

In 1994, the United States Congress asked NASA to identify 90% of asteroids 
larger than 1 km in diameter that could pose a threat to Earth. This led to the Near Earth 
Observing (NEO) program, which catalogued the objects and tried to determine their 
characteristics. In 2005, NASA’s mission was expanded to track near Earth objects 
greater than 140 m in diameters. Obviously the largest dinosaur-killing asteroids are 
the most dangerous. However, the question arises, how small does an asteroid have 
to be before we don’t have to worry about it? Little is known about asteroids smaller 
than 140m in diameter, and whether they are safe to ignore. What if one exploded 
over an ocean. Could it generate a tsunami that would change it from a regional to a 
more global hazard that would threaten coastal populations far away? 

As it turns out, in February, 2013 an approximately 20-m asteroid exploded 
about 15 miles above the ground over Chelyabinsk, Russia. This airburst provided an 
unprecedented opportunity for data collection. Teams of scientists visited, collected 
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https://cneos.jpl.nasa.gov/fireballs/ Alan B. Chamberlin (JPL/Caltech) 


Fig. 1 Airbursts reports from April, 1988 to Dec, 2019. Figure taken from https://cneos.jpl.nasa. 
gov/fireballs 


samples of the meteor to determine its composition, analyzed web cams from Rus- 
sian cars to determine the trajectory and energy deposition, canvassed the region to 
see how far away windows broke (evidence of the blast overpressure), etc. [15]. In 
other words, data was collected that could be used for model validation. The ATAP 
project started shortly thereafter. 

A reader might wonder how often such airbursts really occur. Figure 1 shows that 
in fact airbursts happens quite regularly. Since most of the world's surface is water, 
an investigation into airburst-generated tsunamis seems warranted. 

In this talk I will focus only on simulations of smaller asteroids that explode 
before hitting the ground. There is very little literature on the effects of these air- 
bursts. There is some literature on simulations of larger asteroids that do reach the 
ocean, and sometimes reach the ocean floor [4, 17, 18]. Impact simulations are gen- 
erally performed using hydrocodes that simulate material deformation and failure, 
multimaterial phase changes (e.g. water turns into vapor and rises through the atmo- 
sphere), sediment excavation from the ocean floor, shock waves traveling through 
water, etc. A nice discussion can be found in the chapter by Gisler in [6]. These are 
very expensive calculations, so they tend to be axisymmetric to reduce cost, includ- 
ing the bathymetry.! Asteroid impact simulations is a dynamic area that is receiving 
a lot of recent attention [12, 13, 16]. 

In the next section we will present our simulations using the shallow water equa- 
tions modeled with the GeoClaw software package, and describe how GeoClaw was 
adapted to model asteroid airbursts. We will review our analysis of a model problem 
that helps understand the simulations results. However, it turns out that airburst- 
generated tsunamis have smaller length scales that earthquake-generated tsunamis. 
Hence we will turn to the linearized Euler equations to bring in the effects of com- 
pressibility and dispersion. It will turn out that dispersion is a much more important 


! Bathymetry is underwater topography. 
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factor at the length scales and pressures of interest, and luckily the shallow water 
equations seem to overestimate the effect. We will conclude that airburst-generated 
tsunamis do not pose a global threat. This was the conclusion reached by all partici- 
pants in the joint NASA-NOAA tsunami workshop in 2016 using a variety of codes 
and test problems, summarized in [11]. 


2 Simulations of Airburst-Generated Tsunamis 


2.1 Background 


The simulations we first present use the open-source software package GeoClaw 
[9]. GeoClaw solves the depth-averaged shallow water equations on bathymetry. It 
uses a second order finite volume scheme with a robust Riemann solver to deal with 
wetting and drying [5]. Very important for trans-oceanic wave propagation where 
coastal inundation is also important is the use of adaptive mesh refinement. GeoClaw 
uses patch-based mesh refinement, allowing resolution in deep water with grid cells 
the size of kilometers, and on land on the order of meters. Other issues such as 
well-balancing (an ocean at rest on non-flat bathymetry stays at rest), and a well- 
balanced and conservative algorithm for adding and removing patches, are also part 
of GeoClaw. Desktop-level parallelism using OpenMP has also been implemented. 
There is no data from asteroid-generated tsunamis to use for benchmarking. We 
mention however that GeoClaw has had many benchmarking studies performed for 
earthquake-generated tsunamis, especially extensively in 2011 in [7]. This set of 
benchmarks was performed to allow GeoClaw to be used in hazard assessment work 
funded by the U.S. National Tsunami Hazard Mitigation Program. 

The shallow water equations can be derived from the incompressible irrotational 
Euler equation using the long wavelength scaling, by assuming the ratio e = h/L < 
1. Here, h is the depth of the water and L is the length scale of interest. This scaling 
leads to the conclusion that the velocity of the water in the z direction only enters 
at O (€), and the horizontal velocities are constant in the vertical direction to O(e?). 
Eliminating the need to compute the vertical velocity reduces the three-dimensional 
simulation to a much more affordable calculation using only the horizontal velocities 
u and v. 

Ordinarily the pressure only appears as a gradient in the shallow water equations, 
allowing the value for the pressure itself to be set arbitrarily. In our simulations 
however we will need to match the pressure at the top of the water column with the 
atmospheric pressure produced by the asteroid blast wave. Re-deriving the shallow 
water equations and retaining the pressure produces the following set of equations 
for simulation: 
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The other terms in (1) are g, gravity, pe, the external atmospheric pressure at the water 
surface, and p,, = 1025 kg/m? is the density of salt water. B(x, y) is the bathymetry 
(underwater topography, or depth of the ocean floor). Note that the pressure forcing 
appears in a non-conservative form, as does the bathymetry. In these equations, a 
flat ocean would have h(x, y) = — B(x, y). This is often described using the water 
elevation n(x, y) = h + B, where sealevel is n(x, y) = 0. In these equations we have 
neglected the Coriolis force (often considered unimportant for tsunami propagation). 


M?,/(u?+v?) . NE s : ; ; 
The term D = AO 1s the drag, which is important in numerical simulations 


that include inundation. M = 0.025 is the Manning coefficient which we take to be 
constant. 

To simulate the equation set (1), the external pressure must be known. This is 
obtained from detailed simulations of an asteroid entering the earth’s atmosphere at 
a given speed, angle, and material composition, performed by others in the ATAP 
project [1]. The asteroid deposits its energy in the atmosphere, causing a blast wave. 
The simulations extract the ground pressure p, (x, y), and the width and amplitude of 
a Friedlander profile, an idealized blast wave profile, is fit to the data. This functional 
form is then used in the simulations for the pressure forcing. For simplicity we use 
a radially symmetric source term corresponding to a vertical entry angle for the 
asteroid. (In other simulations we have performed anisotropic simulations, with no 
change to our conclusions.) The blast wave in these simulations travels at 391.5 m/s, 
which we take to be constant. This is somewhat faster than the speed of sound in air. 

Figure 2 shows a typical profile. A Friedlander profile has a characteristic width 
that describes the distance from the leading shock to the ensuing underpressure. 
Figure 2 is used in the simulations as follows: At a given time ¢ in the simulation, 
each grid point needs to evaluate the atmospheric pressure. If the leading blast wave 
travels at speed s = 391.5 m/s, then at time t it has travelled a distance d = 391.5 x t 
meters. If the grid point is farther than d from the initial location of the blast wave 
there is no change to the ambient pressure. If it is less, the pressure profile is evaluated 
at that distance away and fed to the solver. The blue curve in Fig. 2 shows the profile 
at 50s. The amplitude of the overpressure at that time is approximately 100% of 
ambient pressure. It is zero ahead of the blast, and decays as it gets closer to blast 
center. These values are used in Eq. (1). 

The simulation in Fig.2 resulted from a 250MT asteroid. This roughly corre- 
sponds to a meteor with a 200m diameter entering the atmosphere with a speed of 
20km/s. Note that the maximum overpressure of the airburst is approximately 450%. 
(Explosions are measured in terms of MT (megatons) of TNT, relating the equiva- 
lent destructive power to the uses of dynamite; this is also used to quantify nuclear 
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Fig.2 A typical blast wave profile is drawn at two times. The amplitude is fit with a sum of decaying 
exponentials and the profile is scaled to get the pressure forcing at a given time. This functional 
form is then used in numerical simulations 


bombs). For comparison, the explosion of Mount Saint Helens was estimated to be 
25-35 MT. The largest volcanic explosion ever records was Mount Tamboura, which 
was approximately 10-20 Gt, and caused global climate change and mass destruc- 
tion. The airburst over Chelyabinsk was approximately 520 KT. The Tunguska event, 
the largest airburst of the previous century, is now thought to be about 15—20 MT. 

We point out that the length scale of the Friedlander profiles are significantly 
shorter than those of earthquake-generated tsunamis, which are typically on the 
order of 50-100 km. We will come back to this point in Sect. 3. 


2.2 Analytical and Computational Results for Shallow Water 
Equations 


In [2], we propose and analyze a one-dimensional model problem that helps describe 
the results seen in our simulations. The model problem first assumes that the pressure 
disturbance is a traveling wave and then builds on this to solve the problem where the 
pressure disturbance starts impulsively at time zero. Of course the actual pressure 
disturbance is a decaying function that will generate further waves as it changes 
amplitude, but the initial waves are the strongest and most important. 

When the pressure pulse from the airburst hits the water, it causes two distinct 
waves with two different wave speeds. One will be related to the pressure pulse with 
speed s,, and the other is the gravity wave, moving with speed s,. What we call 
the response wave is an instantaneous disturbance of the sea surface that is in direct 
response to the amplitude of the moving pressure pulse and that propagates at the 
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same speed, sy = 391.5 m/s (this is called y above, but we change notation here to 
indicate it is a response to the pressure forcing). 
Our analysis shows the following relationship between the response wave and the 
pressure disturbance pe: 
h 
h = E. (2) 


Pols? — 82) 


In (2), ho is the undisturbed height of the water (i.e. when 7 = 0). This shows that 
the response wave is stronger is deeper water, (almost linearly, since s¿ depends on 
ho too). For 4.5 times atmospheric pressure, at a depth of 3km, the response wave 
would have an initial height of approximately 10.8m. This amplitude would decay 
rapidly with the strength of the blast wave. Note that this response wave has positive 
amplitude, since p, > 0 and s; > Sg. This is counterintuitive, since one would think 
that pushing on water would have lower its height. With hurricanes, the air pressure 
disturbance is negative, and hurricane travel slower than water waves, so again the 
water height increases, but this is more intuitive. 

There are also gravity waves which move at the slower speed s, = /ghm/s. 
When A = 3000 m, this gravity wave moves at slightly less than 171 m/s, less than 
half the speed of the response wave. The initial gravity waves generated can also be 
estimated by linearizing the model problem and solving the homogeneous equation 
to get: 


lo. 5G cp (2+) 4(2 JR (3) 
$ 8 


The first term in (3) is the response wave traveling at blast wave speed s,, and the 
next two are the gravity waves moving to the right and left with speed s,. We see 
that their amplitude is also a function of the amplitude of the response wave. 

We next show results from two simulations at different distances from shore 
and ocean depths. More details on these particular simulations are in [2]. The first 
set of simulations are located off the coast of Westport, Washington. This area has 
been well-studied because of its proximity to the earthquake-prone M9 Cascadia 
subduction zone. The blast was located 180 km from shore, about 30km from the 
continental shelf, and the ocean was 2575 m deep underneath the blast. Figure3 
shows the region of interest. 

Figure 4 shows 3 snapshots at intervals of 25 s after the blast wave. A black circle 
is drawn indicating the location of the blast, the red just inside the circle is the 
response wave, and further interior to the circle is the gravity waves. Note that the 
leading gravity is a depression (negative amplitude). Contours of the bathymetry from 
—1000 to —100 are drawn to show the location of the continental shelf. Although 
the colorbar scale is from —1 to 1, the response wave height near the blast is over 
10m. 

Figure 5 shows a zoom of the waves approaching shore (2000s), about to hit the 
peninsula (3000s), and mostly reflecting (4000s), with some smaller waves entering 
Grays harbor. Note that the landscape is better resolved as the waves approach, 
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Fig.3 The first set of simulations has the blast located 180 km offshore from Westport, in 2575m 
deep water, indicated by the purple star. The zoom shows the region of interest studied for inundation 
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Fig. 4 Westport simulations at intervals of 25 s after the blast. The waves are spreading symmet- 
rically around the blast center. The largest wave is over 10 m at the start 
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Fig. 5 Selected times as gravity waves approach Westport coastline. The zooms cover a changing 
region closer and closer to shore. No inundation is observed. Note the colorbar scale is a factor of 
5 smaller than in the figure above 


indicating that the refinement level has increased. The wave amplitudes have greatly 
decreased, and no inundation is observed. Note that the colorbar scale (in units of 
meters) has been reduced by a factor of 5 in these later plots. 

Since the first set of results did not show any inundation despite such a large blast, 
the second set puts the blast much closer to shore. We locate the blast 30km off 
the coast of Long Beach, California, an area with a lot of important infrastructure. 
Figure 6 shows the topography. The water at the center of the blast is 797m deep. 

Figure 7 shows 3 snapshots at intervals of 25 s after the blast wave. Several features 
are evident. The black circle, which indicates the location of the blast wave at that 
time, no longer coincides with the leading elevation of the response wave (the red 
contours). This is because the topography becomes more shallow at the blast wave 
approaches Catalina Island, so its instantaneous amplitude has decreased, as expected 
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Fig. 6 The second set of simulations has the blast located 30km from Long Beach, in 797 m deep 
water, indicated with the red dot. The zoom shows the region of interest studied for inundation 
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Fig. 7 First row shows computed solution for Long Beach simulation at intervals of 25 s after the 
blast. The black circle indicates the location of the blast wave in air. Bottom row shows zooms near 
shore at two later times 


from Eq. (2). Also notice that that atmospheric blast wave in the atmosphere jumps 
over the island, and the response wave reappears when the blast is again over water. 
Once again we see that the gravity waves are mostly a depression. 

With this proximity to shore, the blast wave has not greatly decayed before it hits 
shore. The blast wave will be the more important cause of casualties and damages, 
and not the ensuing tsunami. The zooms in Fig. 7 have more refinement than the early 
times. The breakwater is now resolved, and water only approaches shore through the 
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breakwater gaps or around the edge. But since the port infrastructure is two meters 
high, there is still no flooding. A very tiny bit of flooding is seen along the river (not 
visible in these plots). 

We performed a number of additional simulations in a variety of locations, 
bathymetries, and asteroid strengths, including one with one Gt of energy. We have 
not found any examples where airbursts have caused significant onshore inundation. 
However, in the next section we examine whether the shallow water equations is an 
appropriate model for airburst-generated tsunamis, and compare the previous results 
with similar analyses and computations using the linearized Euler equations. 


3 The Linearized Euler Equations 


As reviewed earlier, the shallow water equations are a long wavelength approximation 
to the full 3D equations. Since the length scales of the Friedlander profile are on the 
order of 10km, the ratio of water depth to length scale is not that small in a 4km 
ocean. Closer to shore the shallow water equations may be more appropriate. The 
length scales are also important in determining the effect of dispersion, which is not 
present in the shallow water equations. 

To examine this more closely, we compare the results from the previous section 
using the shallow water equations with those from the linearized Euler equations. 
This brings in the effects of both compressibility and dispersion. The latter equations 
have the advantage that the free surface boundary condition of the full Euler equations 
becomes a simple boundary condition when linearized, so the free water surface and 
the atmosphere do not have to be tracked or computed. Unfortunately it does require 
that the vertical direction be discretized along with the two horizontal directions, and 
so is much more expensive than a depth-averaged equation set. 


3.1 Analytical and Computational Results for Linearized 
Euler 


Again, we first review the results from [2] for our model traveling wave problem but 
for the linearized Euler equations (which are also derived there). Unlike the shallow 
water equations, which do not have any dependence on wave length, there is such 
a dependence in the Euler equations. We first present results for a single frequency 
k, where the length scale L = 27 /k. We then apply our results to a function with 
many frequencies. Finally we show some preliminary results of radially symmetric 
simulations confirming the model problem conclusions. 

If we denote the external pressure forcing p,(m) = Azre", where m = x — Spt 
is the traveling wave variable in our model problem, we can compute the response 
coefficients as a function of wave number, i.e. h, (m) = h,eikm and amplitude Az, and 
similarly for the velocities u and now the vertical velocity w too. The traveling wave 
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Fig. 8 Comparison of response wave amplitudes as a function of length scale for the shallow water 
and linearized Euler equations. These were evaluated for a 4 km deep ocean, and 1 atm overpressure. 
At smaller length scales the dominant difference is due to dispersion, not to compressibility 


problem can no longer be solved exactly, but can be evaluated numerically. In Fig. 8, 
we evaluate the solution to the model problem using an ocean depth of 4km, and an 
amplitude of 1 atmosphere for the overpressure. We take the speed of sound in water 
Cy = 1500 m/s, and density p,, = 1025 kg/m?. Figure 8 also evaluates the results for 
an artificially faster speed c,, = 10%, in order to approach the incompressible limit. 

The green curve in Fig. 8 is the shallow water amplitude of the response wave. It 
is constant, since as expected there is no dependence on wave number. We can also 
compute the nonlinear response, which is done in [2], and overlays the linearized 
response. The blue curve is the linearized Euler result using the real sound speed of 
water. This does not appear to approach the shallow water curve. The red curve uses 
the artificially larger sound speed c,, = 10%, which approaches the incompressible 
limit and does approach the shallow water curve, giving us more confidence in the 
results. The difference between the linearized Euler curve and the shallow water 
curve is roughly 10%. We are calling this the effect due to compressibility. However, 
at the length scale of interest for airburst-generated tsunamis, the difference between 
the curves is over a factor of 2. We conclude that dispersion is a much more important 
effect. 

Figure 8 showed the amplitude response due to a single frequency pressure per- 
turbation. In Fig.9 we evaluate the response to a Gaussian pressure pulse p.(m) = 
exp(—0.5(m/ 5)?) that includes all frequencies. We take the Fourier transform, mul- 
tiply each frequency by the Fourier multiplier shown in Fig. 8 and transform back, so 
this is still a static response. The left figure shows results in 4 km deep water, and the 
right in 1 km deep water. Again we see that compressibility accounts for a smaller 
portion of the height difference between shallow water and linearized Euler results 
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Fig. 9 Comparison of responses to a Gaussian pressure pulse in 4km deep water (left) and 1 km 
deep water (right) 


than dispersion. Note also that the Euler results have broadened, an indication of 
dispersion. The results in shallower water match better, as expected. Luckily, in all 
cases the shallow water results overestimate the response including compressibility 
and dispersion. 

Finally, in Figs. 10 and 11 taken from [3] we show snapshots from time dependent 
simulations with the 250 Mt airburst and compare linearized Euler (denoted AG for 
acoustic with gravity in the legends), shallow water, and two different Boussinesq? 
models [8, 14]. We thank Popinet for the use of Basilisk in simulations using the 
Serre-Green-Naghdi (SGN) set of equations, and Jiwan Kim for the use of Bouss- 
Claw, which uses the Madsen Sgrensen equation set [10]. 

We first show results in a 4 km deep flat ocean, then 1 km deep. Note that the scales 
are not the same in the two figures. Also, since the tsunami travels more slowly in 
shallower water, we only show those results every 100 s. Note that the leading shallow 
water gravity wave is a depression in both simulations. Also note that the two Boussi- 
nesq simulations agree with each other better than with the linearized Euler runs. 
The SGN simulation is in two space dimensions, and plotted as a function of radius, 
hence is much noisier than the other simulations which were one-dimensional radi- 
ally symmetric computations. We point out that Boussinesq waves decay inversely 
proportional to distance traveled, whereas shallow water waves decay inversely to the 
square root of distance. Finally, all 4 codes show the same response wave behavior 
as an elevation in sealevel, albeit with different magnitudes. 

We do not think that the depth-averaged equations are suitable for simulating the 
initiation of gravity waves, since there is significant variation in the vertical velocity. 
It does seem that depth-averaged equations can be used to propagate the waves, once 


? Generally speaking, the Boussinesq equations keep the next term in the long-wavelength expansion 
for the shallow water equations. They are depth-averaged, but much more complicated than shallow 
water since they include dispersive terms with third order derivatives. We do not describe them 
further. 
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Radial Airburst (4km depth): time 100 sec 
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Fig. 10 Comparison of initial generation of airburst tsunami using all 4 models in a 4km deep 
ocean. Selected frames every 50s. After 300s, the SGN and BoussClaw resuls match linearized 
Euler in the leading gravity wave, but not (yet) the rest. The SWE model does not generate gravity 


waves that match at any of the times 


initiated by a higher fidelity simulation. This has been demonstrated in [3]. We do 
not yet know how this translates into shoreline inundation. Preliminary evidence 
indicates that the shallow water model provides an overestimate of run-in due to 
airbursts, as it did in predicting wave height for the response wave, but we need more 


evidence for this hypothesis. 
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Fig. 11 Comparison of airburst generated tsunamis using all 4 models in a 1km deep ocean. 
Selected frames every 100s. After about 200s, SGN and BoussClaw match the linearized Euler 
results in the leading gravity wave, and by 400s, the next few waves are very similar, though the 
amplitude is not quite right. The shallow water model still has very different waves 


4 Conclusions 


We have presented several numerical simulations of the shallow water equations in 
response to a 250Mt airburst. The results are further explained using a traveling 
wave model problem, for both the shallow water and linearized Euler equations. All 
results show that there is no significant water response (in either the response wave 
or the gravity wave) to the airburst. The most serious danger from an airburst would 
be from the blast itself if close enough to the blast center, rather than from water 
waves it generated. 

We also found that because of the shorter wave-lengths of an airburst, the shallow 
water equations do not provide an accurate simulation of propagation for these waves, 
compared to simulations using Boussinesq or linearized Euler models. However it 
may be possible to use the shallow water equations to give an estimate of shoreline 
inundation. This is a matter for future study. 
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Abstract This presentation deals with four case studies in environmental and indus- 
trial mathematics developed by the mathematical engineering research group (mat+i) 
from the University of Santiago de Compostela and the Technological Institute 
for Industrial Mathematics (ITMATI). The first case involves environmental fluid 
mechanics: optimizing the location of submarine outfalls on the coast. This work, 
related to shallow water equations with variable depth, led us to develop a theory 
for numerical treatment of source terms in nonlinear first order hyperbolic balance 
laws. More recently, these techniques have been applied to solve Euler equations 
with source terms arising from numerical simulation of gas transportation networks 
when topography via gravity force is considered in the model. The last two problems 
concerns electromagnetism. One of them is related to nondestructive testing of car 
parts by using magnetic nanoparticles (the so-called magnetic particle inspection, 
MPI): mathematical modelling of magnetic hysteresis to simulate demagnetization. 
Finally, we present a mathematical procedure to reduce the computing time needed 
to achieve the stationary state of an induction electric machine when using transient 
numerical simulation. 


1 Introduction 


Four case studies developed by the Research Group in Mathematical Engineering 
from the University of Santiago de Compostela (USC) and the Technological Institute 
for Industrial Mathematics (ITMATI) are considered. Two of them are related to 
fluid mechanics. The first one was developed in the framework of a contract with the 
Ministry of Public Works of Galicia and concerns shallow water flows in a domain 
with variable depth. The second one deals with gas flow in transport networks and has 
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been done for the Reganosa company. From the mathematical point of view both are 
modelled with systems of nonlinear hyperbolic partial differential equations with 
source terms and the goal is to set up suitable finite volume discretization of the 
source terms. 

The other two case studies concern electromagnetism. The goal of the first one, 
that has been financed by CIE Automotive company, is numerical simulation of mag- 
netization and demagnetization processes in magnetic particle inspection procedures. 
Finally, the last case study is related to numerical solution of electric machines with 
optimal design in view. The underlying mathematical problems are, respectively, 
mathematical and numerical analysis of models for electromagnetic hysteresis, and 
methods to determine appropriate initial conditions for transient electromagnetic 
simulations, in order to attain the steady state as soon as possible. 


2 Environmental Flows. The Shallow Water Equations 


The technical goal of this work, commissioned by the Galician government to our 
research team in the eighties, was to determine the optimal location of submarine 
outfalls along the coast of the Galician rias. For this purpose several steps were done 
involving modelling, simulation and optimal control: 


e To compute the velocity field due to tidal currents and wind which was done by 
using the shallow water equations 

e To solve a mathematical model giving the evolution and dispersion of some pol- 
lution indicators as fecal coliforms or biochemical oxygen demand (BOD) 

e To formulate and solve some constrained optimal control problems related to 
outfall position and management of wastewater treatment systems. 


Regarding the first step, as the shallow water equation is a nonlinear system of 
hyperbolic partial differential equations, numerical methods developed in the eighties 
of the last century for Euler equations can be applied to its numerical solution. 
We mean finite volume methods combined with approximate Riemann solvers. The 
unexpected problem we found was related to the discretization of the source term 
which is present in the shallow water equations when the bottom is not flat. In order 
to give some insight we refer to Fig. 1: we have solved the shallow water equations by 
using a finite volume scheme with the Van Leer Q-scheme as approximate Riemann 
solver for flux term upwinding, and a centred scheme to discretize the source term 
arising from non-flat bottom. We have considered a static configuration in a closed 
channel, more precisely, the initial condition (and then the solution along the time) 
corresponds to water at rest. In the left plot one can see the computed water level 
which is a quite good approximation. However, the right plot shows the computed 
velocity which varies between around —60 and 80 m/s while the exact velocity is 
null. 
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Fig. 1 Shallow water. Centred discretization of the source term. Computed water level (left) and 
computed velocity (right). Notice that the zero line is the result of a numerical simulation using [10] 


Motivated by this problem, in the old paper [10] we developed a general method- 
ology to discretize source terms in nonlinear systems of first order hyperbolic partial 
differential equations. In particular, our methods solve exactly the previous static 
problem. This paper is considered a seminal work in the theory of well-balanced 
schemes for numerical solution of conservation laws with source terms, an active 
field of research during the last years. Moreover, thirty years later, this methodol- 
ogy was applied by our research group to a different problem: Euler equations with 
gravity, more specifically, to numerical simulation of gas transportation networks on 
non-flat topography. 


3 Gas Network Simulation 


This industrial demand from the Reganosa company consisted in writing a software 
code for transient numerical simulation of a gas transport network. In Fig.2 the 
high-pressure Spanish gas network is shown. Besides the great number of pipes, it 
includes entry (emission) and exit (consumption) points, underground storages and, 
more importantly, compression stations. The latter are needed to compensate the 
pressure drop along the network due to viscous friction of the gas on the pipe walls. 


3.1 Mathematical Modelling: Homogeneous Gas Flow 
in a Pipe 


The mathematical model for gas flow in a pipe consists of Navier-Stokes equations 
for compressible flows. More precisely, it involves the mass, momentum and energy 
conservation laws and some additional equations: the state equations for real gases 
and the Darcy-Weisbach law for turbulent friction between gas and pipe walls com- 
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Fig. 2 Spanish gas transport network 


bined with Colebrook equation to compute the friction factor. As the pipe length is 
much larger than the area of its cross-section we can use a 1D model: 


dp d(pv) = 
gp m (x, t) =0, 
e A 
d(pv) nays 8(ov^ + p) TRUE AED es Dlo, D cat, 
ot ax 2D capo. Dh W, 
friction gravity force 
D (x, t) + or PG, t) 2 —gp(x, t)v(x, Hh’ (x) 
eio ey 


power of gravity force 


4 
+ a zy Gext (x, t) —0(x,t)). 
heat exchange 


Thermodynamic equation of state: p =Z(0, p)o RO 


Caloric equation of state:e =E — Slee with 
0 
e =€(0) = &(09) + f Cy(s) ds 
9 


Some Case Studies in Environmental and Industrial Mathematics 23 


0 is absolute temperature (K) 

pis pressure (Pa) 

Z(0, p) is the compressibility factor (dimensionless) 
E is the specific total energy (J/kg) 

e is the specific internal energy (J/kg) 

0o is a reference temperature (K) 

Cy(@) is the specific heat at constant volume (J/(kg K)). 


3.2 Numerical Solution: One Single Pipe with Homogeneous 
Gas 


Numerical methodology for solving the compressible Euler equations for homoge- 
neous mixtures of perfect gases without sources has been well established since the 
eighties of the last century. For instance, one can use a simple first-order method 
consisting of Euler explicit for time discretization, finite volume method for space 
discretization, and approximate Riemann solvers (e.g., van Leer’s Q-Scheme) for 
upwind discretization of the flux term (see, for instance, [24]). However, when source 
terms are present (e.g., the gravity term with variable heigth), numerics is more dif- 
ficult and similar to the shallow water equations the use of well-balanced schemes 
is mandatory. This means that the discretization of source terms also needs some 
upwinding. In the last years many papers devoted to numerical solution of Euler 
equations with gravity have been written. Let us mention, for instance, [13-15, 23, 
25, 27]. 

In order to highlight the need of using an upwind discretization of the source 
terms, we consider the following very simple test problem: h(x) in the gravity 
source term is an arbitrary function and we look for a static isothermal solution, 
i.e., satisfying v(x) 20, 0(x)-—0,4, Vx € (0, L). It is easy to see that the 


$ (n(x) = ho). and 


ext 
(A(x) — ho)). For the data given in Table 1, the com- 


exact solution is given by v(x) = 0, p(x) = po exp | — R 
P(x) = ROexr po exp ( — m 
ext 
puted mass flow rate is shown in Fig.3 as well as the exact solution which is null. 
One can see that the former is very bad, oscillating between around —10 and 10. 
By using the general methodology developed in [10], we have proposed a dis- 
cretization of the gravity term in [7] leading to a well-balanced scheme that repro- 
duces the null solution exactly. 


Table 1 Data for static isothermal test 
R (J(kg K)) bext (K) h(x) (m) L (m) 


480 288.15 1000 sin ( oes x 40,000 
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Fig. 3 Mass flux, (kg/(m? s)). Computed with centred discretization of source terms (black) and 
exact (red). The horizontal axis is the distance to the origin of the pipe 


3.3 Network with Heterogeneous Gas 


Simulation of heterogeneous gas flowing in a network is more difficult. New problems 
arise: junction modelling, gas quality simulation. These issues have been addressed 
in papers [8, 9]. 


3.4 Experimental Validation in a Real Small Network 


The code has been used for a small gas network and the results have been compared 
to real measurements. The network can be seen in Fig. 4. 

Topography is quite irregular as can be seen in Fig. 5. Results and measurements 
corresponding to mass flow rate and pressure for some particular nodes are shown 
in Figs. 6 and 7, respectively. 


4 Non-destructive Testing: Magnetic Particle Inspection 
(MPI) 


MPI is a non-destructive testing technique to detect near-surface defects in ferromag- 
netic pieces. The process is as follows: firstly, the workpiece is magnetized. Then, the 
presence of a surface discontinuity in the material allows the magnetic flux to leak, 
since air cannot support as much magnetic field per unit volume as metals. In order 
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Fig. 4 The Reganosa network (Galicia. Spain) 
500 
400 


300 + 


h (m) 


Fig. 5 Height function for edge #9 


to identify a leak, ferrous particles, either dry or in a wet suspension, are applied 
to the workpiece. Then they are attracted to an area of flux leakage and form what 
is called an indication which is evaluated to determine its nature. Since cracks are 
more easily detected when they are perpendicular to the induced field, two magneti- 
zations are made: circular and longitudinal. After inspection, a final demagnetization 
step is required for subsequent processing of the workpiece. In the next subsection 
we introduce an axisymmetric model for circular magnetization and present some 
numerical results (Figs. 8 and 9). Further details can be found in Refs. [2, 4-6]. 
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Fig. 6 Mass flow rate at node 01A. Blue: real measurement. Red: computed with a homogeneous 
gas model. Green: computed with a heterogeneous gas model 
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Fig. 7 Pressure at node I-015. Blue: real measurement. Red: computed with a homogeneous gas 
model. Green: computed with a heterogeneous gas model 
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Fig. 8 Magnetic particle inspection 
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Fig. 9 Crack indication. Circular magnetization. Longitudinal magnetization 


Fig. 10 Circular 
magnetization 


4.1 Circular Magnetization. Axisymmetric Model 


Let us introduce a mathematical model for circular magnetization. Thanks to axisym- 
metry, it can be written on a meridional section (see Fig. 10). 

Given Z(t), the magnetizing or demagnetizing current, and an initial condition 
Ho, find Hg in $2 x (to, T] such that 
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0 Bo 1 : 
PEZ + curl | — curl (Hoe) | =0 in 2 x (tp, T], 
g 
Hy (0, Z, t) = 0 on (0, L) x (to, T], 
I(t) 


Hg(Rs(z), z, t) = 2n Rs) 


on (0, L) x (to, T], 


0 Ho 
a; OLP =0 on qu UI) x (to, T], 


H(p, z, to) = Ho(p,z) in 2. 


and 


Bo (x, t) = B(Ho(x, ), EDE), 


where Bis a scalar hysteresis operator to be defined later. 


4.2 Hysteresis Modelling 


Mathematical modelling of hysteresis is now a well established subject (see, for 
instance, the reference books [11, 12, 17-19, 26]). Let us summarize the main 
issues of the theory. We consider a system whose state is characterized by two scalar 
variables, u and w, which are assumed to depend continuously on time t. In our case 
u = He and w = Bs. The value of w(t) is determined by u(t) and by the values of 
u(1) for t « t. Let us introduce some basic definitions and notations (Fig. 11). 

At any instant t, w(t) depends on the previous evolution of u, and on an initial 
state of the system to be called £. We can formalize this as follows: 


w(t) = F(u,&)(t) Vt € [0, T]. 


Fig. 11 Hysteresis major 
and minor loops 
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Fig. 12 Preisach triangle (left) and an example of Preisach function (right) 


Here F (-, £) represents an operator between suitable spaces of time-dependent func- 
tions. Notice that F is non-local in time. A particular example of hysteresis operator 
is the Preisach operator: 


F:C9(0, T) x Y — c*(o, TD, 


[F (u, IO :— Jia sonora, 


T 


where 7 is the Preisach triangle, 0 < p € L! (T) is the Preisach function which is 
determined by physical experiments for each material (see Fig. 12), h, is the relay 
function (see Fig. 13) andé : 7 > (—1, 1) isa Borel measure representing the initial 
magnetic state. 

The classical Preisach model is built with the so-called rate-independent relay: 
let us fix any pair p :— (p1, p?) € R2, pı < p». For any continuous function u : 
[0, T] > R and any å € (—1, 1}, we define h,(u, &) as follows. 

Lett; <... < ty < t be such that u(t;) € (pi, 02}. If {ti} = Ø or t = 0, then 


—lifu(r)z pi, 
hu, Et) = 4 $ if pi <u(t) < p, 
1 if u(t)> p», 
else 
__ J 1 if uy) — p, 
ee Me) ES if u(ty) = pi. 


If we split 7 = S; (t) U S7 (t), where 


Sit) = [G0 €T : irou, HI) = 1}, 
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Fig. 13 Classical relay operator 


Fig. 14 Input u(t) (left) and its corresponding splitting of Preisach triangle (right) 


then 
[Fu, EJ) = ri ptadp— f pdn: 


Sit (t) Su (t) 


We present some results obtained by solving the above model for a real crankshaft 
(see Fig. 14 for input data). Figure 15 shows the remanent magnetization after the 
circular magnetization process. In its turn, Fig. 16 shows the applied demagnetization 
current and the remanent magnetization after demagnetizing. 


5 Accelerated Simulation of Electric Machines 


In the design of electric machines (see Fig. 17), numerical simulation is an important 
tool. The engineer needs to know the behaviour of the machine in steady regime. In 
particular, he/she wants to know the torque. In order to get this steady state, finite 
element methods are used to solve a transient nonlinear system of PDEs derived 
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Fig. 15 Remanent magnetization 
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Fig. 16 Demagnetization current (left) and remanent magnetization after demagnetizing (right) 
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Fig. 17 Main parts 
integrating an induction Coil side 
motor. From Wikimedia 
Commons by Mtodorov 69 
under license CC-B Y-SA-3.0 


Shaft 


End-ring Rotor bar 


from Maxwell equations, coupled with electrical circuit equations, starting from 
an (arbitrary) initial condition until the steady state is achieved. The time for this 
transient model to attain the steady state highly depends on the choice of the initial 
condition. When an unappropriate value is prescribed (for instance, when it is set to 
zero), a very long CPU time is needed to reach the steady state solution. Therefore, 
techniques leading to a suitable initial condition are in high demand and in the 
literature we can find several approaches to the problem. Let us mention, for instance, 
time periodic finite element methods [21], time periodic-explicit error correction 
methods [16], time differential correction [20], parareal algorithms [22]. A common 
drawback for these methods is the need of choosing a suitable time interval in which 
the solution is assumed to be periodic: the so-called effective period. Indeed, magnetic 
fields in rotor and stator oscillate at different frequencies and the common time at 
which both are periodic is generally quite large. However, the periodicity condition 
has to be defined in a short time interval for the method to be useful. Our methodology 
aims to compute a suitable initial condition and has the advantage of making use of 
periodicity property only in the rotor bars, so the above limitation does not apply. 
Moreover, the computational cost of our approach does not depend on the size of this 
period, and the number of unknowns is very small in comparison with the previously 
mentioned methods. 

This work has been developed under contract with the company Robert Bosch 
GmbH from Stuggart (Stefan Kurz, Marcus Alexander). It has given rise to a Spanish 
patent. A detailed description of the methodology has been published in papers [1] 
and [3]. 


5.1 Description of the New Methodology 


The main lines of the developed methodology can be described for a toy model. Let 
us consider a simple series circuit with an inductor and a resistor, 


LI) + RI) = E(t), 
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Conductors 


Magnetic 
cores 


Fig. 18 A quarter of the geometric domain at time t = 0 (left) and £ > 0 (right). Modification of 
a picture provided by Robert Bosch GmbH 


with the electromotive force 
E(t) = Esin(ot) 


The general solution is 


R 
I= Aet + 
ok ca |Z(@)| 
transient part 


sin(wt — p(w)) 
steady solution 


where Z(w) = R + wLi € C is the impedance of the circuit and (c) its argument. 
We have two opposite extreme situations: 


e If RT > 1, then the exponential vanishes quickly independently of the initial 
condition 

e If EL « 1 then the transient part strongly depends on the initial condition. More- 
over, in this case 


"ES > and |Z(w)| © oL 


and hence E 
I(t) = Aet + — cos(ot). 
oL 


If the equation is solved for I(0) = 0, then the solution is approximately given by 


E 
I(t) & ——et 


+ — cos at, 


so it includes a transient part. However, if the equation is solved for 
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I(0) = A 


then A = 0 and the transient part is close to zero from the beginning. The important 

remark is that, if RT < 1 then the above initial condition can be obtained without 

solving the ODE, as follows: 

e Firstly, the term involving the resistor can be neglected 

e Then, we integrate the equation twice: first between O and £ and then between 0 
and T. We get 


T 


T 
T 
L f tar - trio = ; [T =») sins ds = = 
(09) 
0 0 


e Moreover, since the steady solution is harmonic then He I(t) dt = 0 and from the 
above equation we deduce 


1(0) = 1 T 
 lLlTo oL 


which is the suitable initial condition previously obtained. The interesting feature of 
this method is that it can be used in more general settings; in particular, to the model 
of induction machines with squirrel cage. In this case, the problem to be solved is 
the following: 

Given currents along the coil sides I,(t),n = N, + 1,..., Nc, and initial currents 
along the bars y9, n = 1,..., Np, find, for every t € [0, T], currents y,(t), n = 
1,..., Np, along the bars such that y, (0) = y8, n=1,..., Ny, and 


REF (ty) + (i^ + (45) B^! (4^) y^() - A() (ey (2) =0, 
bib 0 _ 
A’y o-(2) =0, 


where F : [0, T] x R^ — RY is the nonlinear operator defined as 


T 


Fit, w) := fosa, y,t) dxdy,..., I cA(x,y,t) dxdy| eR”, 
Q Qu, 
fo te[0,T],w e R™, with A(x, y. t) the solution to the following nonlinear 
magnetostatic problem: 
Given a fixed t € [0, T], currents along the coil sides I,(t),n = Ny + 1, ..., No, 
and w € RW, find a field A(x, y, t) such that 
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— div(vo gradA) =0 in $2" U r, (2) ] 


Wn . 
— di dA) = ———— 2,,n=1,...,Np, 
iv(vo grad A) meas) in n b 
L,(t) 
— div(vp grad A) = ————— inr,((2,)), n= Np 4-1, ..., No, 
meas(S2,,) 


—div(v(-, |gradA|) gradA) = 0 in Qh U r, (2%) 


nl 


with suitable transmission and boundary conditions. 


5.2 Numerical Experiments with Real Electric Machines 


We present the numerical results obtained for a particular induction machine with 
squirrel cage rotor. Firstly, we use our method to get a suitable initial condition. Next, 
we solve the transient model with this initial condition and compare the time needed 
to reach the steady-state with the one needed by taking null initial condition. The 
electric machine we have used for numerical experiments can be seen in Figs. 18 and 
19. For confidentially issues itis a modification of a picture provided by Robert Bosch 
GmbH. Red, yellow and blue colors correspond to the three different phases. It is 
composed by 36 slots in the rotor and 48 slots in the stator. It is a three-phase machine 
having 2 pole pairs with 12 slots per pole. The source currents are characterized by 
an electrical frequency f. and a RMS current J, through each slot. The currents 
corresponding to each phase of the stator are defined as 


I(t) = 42 L cos (27 fet) , 
Ig(t) = V2 I. cos (244 + =) : 


Ic(t) — A2 I. cos (244 — =) i 


We have considered four operating points corresponding to different electrical 
sources in the stator and different rotor velocities. They are described in Table2. 
The physical time to reach the steady state for the different operating points can 
be seen in Table3. Finally, in Fig. 20, the computed torque and current along the 
transient simulation are shown for operation point # 4. 

Notes and Comments. 


e We have presented four case studies in industrial mathematics, all related with 
numerical simulation by partial differential equations 

e In addition to the industrial outcome, in all cases scientific papers related to the 
developed methods have been published 

e This shows that industrial problems usually lead to new mathematical develop- 
ments 
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Fig. 19 Transversal section of an induction electric motor with squirrel cage 


Table 2 Operation points for numerical tests 


fc (Hz) n, (rpm) Ic (ARMS) 
Op. Point 1 42.1 1000 675 
Op. Point 2 171.2 5000 314 
Op. Point 3 417.5 12,000 675 
Op. Point 4 632.0 18,000 531 


Table 3 Time to get the steady state with null initial condition and with the one obtained by the 


new method 
Initial condition T steady (s) 
Op. Point 1 y’(0) =0 0.1200 
y^(0) — Yu 0.0600 
Op. Point 2 y^(20 0.0840 
y^(0) — Yu 0.0120 
Op. Point 3 y^(0) =0 0.2100 
y^(0) = Yu 0.0550 
Op. Point 4 y^(0) =0 0.3467 
y^(0) — Yu 0.0133 
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Fig. 20 Op. Point 4. Torque versus time (left). Current in bar 1 versus time (right) 


e Industrial mathematics is a nice area with good opportunities for young mathe- 
maticians willing also to learn other scientific disciplines 

e Postgraduate studies mixing applied mathematics and areas of application as 
physics, chemistry, biology, medicine, economy, etc. are a good initial step to 
develop a career in this promising area of increasing interest for companies and 
research institutions. 
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Abstract The first step in our sensing of smell is the conversion of chemical odor- 
ants into electrical signals. This happens when odorants stimulate ion channels along 
cilia, which are long thin cylindrical structures in our olfactory system. Determining 
how the ion channels are distributed along the length of a cilium is beyond current 
experimental methods. Here we describe how this can be approached as a mathemat- 
ical inverse problem. Identification of specific functions of receptor neuron arrays 
1s a major challenge today in both Mathematics and Biosciences. In this paper, two 
integral equations based mathematical models are studied for the inverse problem 
of determining the distribution of ion channels in cilia of olfactory neurons from 
experimental data. 


1 Introduction 


The first step in sensing smell is the transduction (or conversion) of chemical infor- 
mation into an electrical signal that goes to the brain. Pheromones and odorants, 
which are small molecules with the chemical characteristics of an odor are found 
all throughout our environment. The olfactory system (part of the sensory system 
we use to smell) performs the task of receiving these odorant molecules in the nasal 
mucosa, and triggering the physical-chemical processes that generates the electric 
current that travels to the brain. see Fig. 1 and Sect. 1.1. 

What happens next is a mystery. Intuition tells us that the electrical wave generated 
gives rise to an emotion in the brain, which in turn affects our behavior. Of course, the 
workings of our other four senses is similarly a mystery. And so, we quickly come to 
perhaps one of the most fundamental questions in neurosciences for the future: How 
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Fig. 1 Odorants reaching the nasal mucus (left) and structure of an olfactory receptor neuron (right) 


does our consciousness processes external stimuli once reduced to electro-chemical 
waves and, over time, how does this mechanism lead us to become who we are? 

How can we approach this problem with mathematics? Faced with these reflec- 
tions, applied mathematicians take time to stop and wonder if it is possible to provide 
such far-reaching phenomena with a mathematical representation that allows us to 
understand and act. Biology is synonymous with “function”, so the study of biologi- 
cal systems should start by understanding the corresponding underlying physiology. 
Consequently, to obtain a proper mathematical representation of the transduction of 
an odor into an electrical signal, and before any mathematical intervention, we must 
first detect which atomic populations are involved in the process and identify their 
respective functions. 


1.1 Transduction of Olfactory Signals 


The molecular machinery that carries out this work is in the olfactory cilia. Cilia are 
long, thin cylindrical structures that extend from an olfactory receptor neuron into 
the nasal mucus (Fig. 1). 

The transduction of an odor begins with pheromones binding to specific receptors 
on the external membrane of cilia. When an odorant molecule binds to an olfactory 
receptor on a cilium membrane, it successively activates an enzyme, which increases 
the levels of a ligand or chemical messenger named cyclic adenosine monophos- 
phate (CAMP) within the cilia. As a result of this, cAMP molecules diffuse through 
the interior of the cilia. Some of the cAMP molecules binds to cyclic nucleotide- 
gated (CNG) ion channels, causing them to open. This allows an influx of positively 
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Fig. 2 Signal transduction mechanism for the olfactory system. a In the absence of stimulus 
channels are closed, system is at resting state. b Binding of odorants triggers cAMP synthesis 
and opening of CNG channels, leading to Ca?+ and Na* transport and a CI” flux 


charged ions into the cilium (mostly Ca?* and Na* as illustrated in Fig.2), which 
causes the neuron to depolarize, generating an excitatory response. This response 
is characterized by a voltage difference on one side and another of the membrane, 
which in turn initiates the electrical current. This is the overall process that human 
beings share with all mammals and reptiles to smell and differentiate odors. 


1.2 Kleene' Experimental Procedure 


Experimental techniques for isolating a single cilium (from a grass frog) were devel- 
oped by biochemist and neuroscientist Steven J. Kleene and his research team at the 
University of Cincinnati in the early 1990s [5, 6]. One olfactory cilium of a receptor 
neuron is detached at its base and stretched tight into a recording pipette. The cilium 
is immersed in a cAMP bath. As a result of the phenomenon previously described 
inside the cilium, the intensity of the current generated is recorded. 

Although the properties of a single channel have been described successfully 
using these experimental techniques, the distribution of these channels along the 
cilia still remains unknown, and may well turn out to be crucial in determining the 
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kinetics of the neuronal response. Ionic channels, in particular, CNG channels are 
called “micro-domains” in biochemistry, because of their practically imperceptible 
size. This makes their experimental description using the current technology very 
difficult. 


1.3 An Integral Equation Model 


Given the experimental difficulties, there is a clear opportunity for mathematics to 
inform biology. Determining ion channels distribution along the length of a cilium 
using measurements from experimental data on transmembrane current is usually 
categorized in physics and mathematics as an inverse problem. Around 2006, a 
multidisciplinary team (which brought together mathematicians with biochemists 
and neuroscientists, as well as a chemical engineer) developed and published a first 
mathematical model [4] to simulate Kleene’s experiments. The distribution of CNG 
channels along the cilium appears in it as the main unknown of a nonlinear integral 
equation model. 

This model gave rise to a simple numerical method for obtaining estimates of the 
spatial distribution of CNG ion channels. However, specific computations revealed 
that the mathematical problem is poorly conditioned. This is a general difficulty in 
inverse problems, where the corresponding mathematical problem is usually ill-posed 
(in the sense of Hadamard, which requires the problem to have a solution that exists, 
is unique, and whose behavior changes continuously with the initial conditions), or 
else itis unstable with respect to the data. As a consequence, its numerical resolution 
often results in ill-conditioned approximations. 

The essential nonlinearity in the previous model arises from the binding of the 
channel activating ligand (CAMP molecules) to the CNG ion channels as the ligand 
diffuses along the cilium. In 2007, mathematicians D. A. French and C. W. Groetsch 
introduced a simplified model, in which the binding mechanism is neglected, lead- 
ing to a linear Fredholm integral equation of the first kind with a diffusive kernel. 
The inverse mathematical problem consists of determining a density function, say 
p = p(x) > 0 (representing the distribution of CNG channels), from measurements 
in time of the transmembrane electrical current, denoted Ip[o]. This mathematical 
equation for p is the following integral equation: for all t > 0, 


L 


gio) = f o% P(c(t, x)) dx, (1) 


0 


where IP is known as the Hill function of exponent n > 0 (see Fig.3). It is defined 
by: 


w” 


w” + Kio 


Vw z 0, P(w) = 
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Fig. 3 The Hill function P 1 + P(w) 
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In this definition, the exponent n is an experimentally determined parameter and 
K1y2 > 0 is a constant which represents the half-bulk (i.e., the ligand concentration 
for which half the binding sites are occupied); typical values for n in humans are 
n ~ 2. Besides, in the linear integral equation above, c(t, x) denotes the concentration 
of cAMP that diffuses along the cilium with a diffusivity constant that we denote 
as D; L denotes the length of the cilium, which for simplicity is assumed to be 
one-dimensional. Here, by concentration we mean the molar concentration, i.e., the 
amount of solute in the solvent in a unit volume; it is a nonnegative real number. 

Hill-type functions are extensively used in biochemistry to model the fraction of 
ligand bound to a macromolecule as a function of the ligand concentration and, hence, 
the quantity P(c(t, x)) models the probability of the opening of a CNG channel as a 
function of the cAMP concentration. The diffusion equation for the concentration of 
cAMP can be explicitly solved if the length of the cilium L is supposed to be infinite. 
It is given by: 


x 
c(t, x) = corte ( ) ; 
24 Dt 
where cy > 0 is the maintained concentration of CAMP with which the pipette comes 
into contact at the open end (x — 0) of the cilium (while x — L is the closed end). 
Here, erfc is the standard complementary Gauss error function, 


2 x 
erfc(x) := 1 — JE I e`" dr. 
0 


Accordingly, it is straightforward to check that c is decreasing in both its variables 
and that it remains bounded for all (t, x), 0 < c(t, x) < co. 

Despite its elegance (by virtue of the simplicity of its formulation), this new model 
does not overcome the difficulties encountered in its non-linear version. In fact the 
mathematical inverse problem associated to model (1) can be shown to be ill-posed. 
More precisely, since P(c(t, x)) is a smooth mapping, the operator o > Ip[p] is 
compact from L” (0, L) to L?” (0, T) for every L, T > 0,1 < p < co. Thus, even if 
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the operator lọ were injective, its inverse would not be continuous because, if so, 
then the identity map in L” (0, L) would be compact, which is known to be false. 


1.4 Non-diffusive Kernels 


This last result certainly has a more general character. In fact, it is clear from its proof 
that any model based on a first-order integral equation with a diffusive smooth kernel 
necessarily results in the problem of recovering the density from measurements of 
the electrical current being ill-posed. 

An initial, natural approach to tackling this anomaly in model (1) was developed 
in Conca et al. [3]. This exploited the fact that the Hill function converges point- 
wise to a single step function as the exponent n goes to +00, the strategy was to 
approximate IP using a multiple step function. 

Based on different assumptions of the spaces where the unknown p is sought, the- 
oretical results of identifiability, stability and reconstruction were obtained for the 
corresponding inverse problem. However, numerical methods for generating esti- 
mates of the spatial distribution of ion channels revealed that this class of models is 
not satisfactory for practical purposes. The only feasible estimates for p are obtained 
for multiple step functions that are very close to a single-step function or, equiva- 
lently, for Hill functions with very large exponents, which imply the use of unrealistic 
models. 

Another way to overcome the ill-posedness of the inverse problem in (1) consists 
of replacing the kernel of the integral equation with a non-smooth variant of the Hill 
function. 

Specifically, let a € (0, co) be a given real parameter. A discontinuous version of 
P is obtained by forcing a saturation state for concentrations higher than a. By doing 
so, one is led to introduce the following disruptive variant of IP (shown in Fig. 4): 


H(c) = P(c) Lesa m Lacey: 


where 1, denotes the characteristic function of the interval J. The mathematical 
problem that recovers p from the electrical current data is therefore modelled by 


L 


Lol) ES IH(c(t, x)) dx, (2) 


0 


where c(t, x) is still defined as before. The introduction of this disruptive Hill function 
can be understood mathematically as follows: as £ — oo, the factor x/4/ Dt in the 
complementary error function defining the concentration tends to 0, and consequently 
c(t, x) tends pointwise to cy. An inverse mathematical problem and a direct problem 
are associated with both models (1) and (2). In the first, the electric currentis measured 
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Fig. 4 A disruptive variant 14 H(c) 
of P (a = 0.157) - 
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and the unknown is the density o of ion channels, while in the direct problem the 
opposite is true. Since these are Fredholm equations of the first type, it is natural to 
tackle them using convolution. Once the variable p has been extended to [0, oo) by 
zero, the Mellin transform is revealed as being the most appropriate tool for carrying 
out this task (see the overview section “Mellin transform"). 


2 A General Convolution Equation 


The Mellin transform is the appropriate tool to study model (2). It allows to reduce 
it in a convolution equation of the Mellin type. To do so, the key observation is the 


fact that H(c(t, x)) can be written in terms of Ye Indeed, defining G as 


1 
G(z) = A (cert (3). 


we have I; [p](t) = la po)G (t dx. Thus, by extending p by zero to [0, oo), and 
rescaling time f in t^, we obtain 


oo 


d 
OE f xp(x)G (+) == (xew) + 6 
X X 


0 


which is a convolution equation in xp (x). 
Taking Mellin transform on both sides and using its operational properties, we 
formally obtain 


1 
¿Mu [o](s/2) = MG(s)Mop(s + 1) 
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or equivalently, 


1 MI 2 
Mp +1) = ID (3) 


Austrian mathematician Robert Hjalmar Mellin (1854-1933) gave his name 
to the so-called Mellin transform, whose definition and properties are recalled 
below. The interested reader is referred to $2 of [1] or Lindelöf [7] for a 
summary of his work, and proof of the main results around this transform. 

For q € R, qc i IR will denote the vertical line (q +it, t € IR} of the 
complex plane having abscissa q, and for p € R (p > 1), L? ([0, oo), x?), or 
simply L7, will stand for the Lebesgue space with the weight x‘, i.e., 


L? = [ f: 10,00) > R | Iflg <+00], 


oo 
where || fli» = Cf |f (x)|?x@ dx)"?. L?, endowed with this norm, is a Banach 
q 0 q 


space. 
Let f be in L! ([0, oo), x1). The Mellin transform of f is a complex-valued 
function defined on the vertical line q + 1 + i R by 


oo 


d. 
Mf(s) = f suem 


0 


From its very definition, itis observed that the Mellin transform maps functions 
defined on [0, co) into functions defined on q + 1 + i R. Like in the Fourier 
transform, Mf is continuous whenever f is in L! ([0, oo), x1). Specifically, 
we have 


Theorem 1 (Riemann-Lebesgue) The Mellin transform is a linear continuous 
map from L! ([0, oo), x4) intoC°(q +1+iR; C) > L®(q+1+iR; ©); 
its operator norm is 1. 


Proposition 1 /f f is in n for every real number q in (a, b) then its Mellin 
transform Mf (-) is holomorphic in the strip S = (s € C | a-- 1 < Re(s) < 
b+ 1). 
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The following table summarizes the main operational properties of the 
Mellin transform: 


Function Mellin transform 

flat), a>0 a Mf (s) 

f, a#0| Mia Mas) 

JOO (7 DF —k)Mf(s — k) 


where, Vx € IR and Vk > 1, (x), stands for the so-called Pochhammer symbol, 
which is defined by 


k-1 
Q-xG-krD-[[e-)D itk>1, 
j=0 


and (x)o = 1, where x is in IR. 


2.1 A Priori Estimates 


Seeking continuity and observability inequalities for model (2) is then reduced to 
find lower and upper bounds for MG(-) in suitable weighted Lebesgue’s spaces. 
Doing so, one obtains 


Theorem 2 (A Priori Estimates) Let k € NU (0) and r € IR be arbitrary. Assume 
that the Mellin transforms of p and h{ p] satisfy (3), then 


Cileli; < IED lus ,, < Calle. 


r-3 
pm 


O 
GR 


se +i 


> 


Es m sup (5), MG(s) 


se > +iR 


and La = L? ([0, 00), x?) stands for the Lebesgue space with the weight x4, p > 1, 
qe R. 


Remark 1 It is worth noting that Ck, Ck could a priori range from 0 to +00. 


Proof Using the properties of the Mellin transform in Eq. (3), it follows that 


(s — K)x Ml ol(s — k) = 2(s — k)x MGQ(s — k)) Mp(2(s — k) + 1) (4) 
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Thanks to Parseval-Plancherel’s isomorphism, for every s in q + i R, we have 


1 
| atop? lu. = Ten [EDS — Be MI ol — 9 iso in 
3 
= —— || (s — k) MGQ(s — k)) Mp (2(s — k) + 1 
JEA 6- MEC - 9) MeQG RD, 
2 
= — MG Qs) Mp(2s +1 
GS [O MGQs) MpQs +1) eee 
1 5 
= —|\(=) MG()Mp( + 1) 
Va (3), L?(2(q—k)+i IR) 
(5) 
As M is an isometry from L? (2(q — k) - 123-i IR) on LA kd: 
IMp(s + Dllisoq-osig = Me ley ++) = 127 lolz aa © 


Thanks to (5) and (6) and the definitions of ch C we get 


C% lell? 


‘A(q—k)+1 


A TE 


‘A(q—k)+1 


Taking r = 4(q — k) + 1, that is q = k + ^, provides the result. 


For two given functions f, g, the multiplicative convolution f * g is defined 


as follows 


f d 
(f * g)(x) = f fog (5) = 
e y y 


Theorem 3 (Mellin Transform of a Convolution) Whenever this expression is 


well defined, we have 


MF * g)(s) = Mf (s) Mg(s) 


Finally, the classical L?-isometry has his Mellin counterpart. 
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Theorem 4 (Parseval-Plancherel’s Isomorphism) The Mellin transform can be 


extended in a unique manner to a linear isometry (up to the constant (27) 1/7) 
from Lai onto the classical Lebesgue space L?(q + i IR): 


M € £ (L3,-15 L'(q +i R, dx)) 


3 Observability of CNG Channels 


The a priori estimates in the theorem above also allow to determine a unique dis- 
tribution of ion channels along the length of a cilium from measurements in time of 
the transmembrane electric current. 


Theorem 5 (Existence and uniqueness of p) Let a > 0 andr < 1 be given. If 
L el? (to. oo), rz), ne Dp (to. oo), pum) and a is small enough, then there 
exists a unique p € L*([0, oo), x") which satisfies the following stability condition: 


ill 3, M 


> 
1? (10,00),1 2") = Cllpll,,> 


L (10,00), +2) 


where C > 0 depends only on a and r. 
Proof The proof is based on the following technical lemmas and its corollaries. 


Lemma 1 Let A and B be two elements of [0, co], k € U(0]N be a nonnegative 
integer and f a function such that f® is in Li (A, B) for every j =0,...,k. For 
every real number t, we have 


p k—1 B 
[reostas = Deva [n FP Cox" Th + cn ga f xt rosas. 
A j=0 A 


where Q; = Q;(r) = (Ta enin). 


Proof We use induction on k € N. For k = 0, since Q. , = 1, there is nothing to 
prove. We assume that the formula is true for an integer k € N. As (k + 1 + it) Qy = 
Qi 1, it remains to prove that 


B B 
(k + 1 + ity fake ena" dx = pOr, E EG dx 
A A 
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AS Ext = Hyi *, the previous relation follows by integration by parts. Indeed, we 
have 


B B 
EOS dx = [Pees dx 


A A 
B 
= [i58 £e oy] I (k + pa dx 
A 


B 


S x dx 


A 


Corollary 1 Let f: [A, B] > R with A, B € [0, oo] be a piecewise C! function. 
If f is non-negative, f' is non-positive, f € L' (A, B), f' € L! (A, B) and for all 
t € R: [xf G)x"]5 = 0, then 


B B 
V/142 frora s f roa. 
A A 


Proof From Lemma 1 with k — 1 one obtains 


B B 
Vtem, (1 +inf f ox! dx = - fara" dx 
A A 


As A, B > Oand f’ < 0, using this previous identity twice, for t 4 O and for t = 0, 
we get 


B B B 
V1 +r? [r dx < f krol a - f reos 
A A A 


Lemma 2 Letn, K >0,q € Rand f = ATE: There exists x, > O such that the 
function gy : x € [x4, 00) > f (x) x*^! is decreasing. Let G = inf E, where E, = 
(c > 0| gœ) < OVx > c}. Thefunctionq — q isincreasingandq = (q/ Qn))!? + 


o (q?) as q > oo. 


Proof As f > 0, the inequality F (x) < 0 is equivalent to 


f@) __a-1 
fœ x 


(7) 
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Let us compute 4 -. To do so, let u = erfc", so that f = zig. We have 
fi uw K erf K 
E =n (8) 
f u u+ K erfc u + K 
Since erfc'(x) 2 —2z-/2e-*, for x large enough, erfc(x) = m~ !"?x-le-* + 
O (rea), and so 
’ erfc’ 
O oi) = Penn (9) 


f(x) erfe(x) 


This asymptotic expansion proves that the inequality (7) is satisfied for large enough 
values of x. As a consequence, for every g in R, the set E, is not empty, which 
justifies the definition of q. Note that the definition of q implies g/(q) = 0, and 


hence, thanks to (7), D = E Let qı > q» be two real numbers. In order to 
show that q2 < qi, it is enough to prove that g/, (92) > 0. This holds true because 


8, (42) = do" ^ Cf QMh+ f) — D) z do" ^ Gf Mh + f@ OQ — 1) 
= qa Pg, (42) =0 


To find an expansion for q, let us recall the following classical lower bound on erfc(x) 


for x > 0, 
1 


zi 
JE 
+ (2427 72 


172 exp(x?) erfc(x) 


As the function u = erfc” takes its values in (0, 1], it < ae < n. Consequently, 
the identities (8) yield 
FO) 
soda 2944) (10) 
-n( ) 570 
Letg > landsetx, = Carina The inequality — 2 < < —n (x + (x? + 2)12) 


is equivalent to x (x + (x? +2)1 =) < 2. A simple computation shows that this 
inequality is satisfied for x = x, (and becomes and equality). Thanks to (10), we 


Des > -E which leads to g > xq, by definition of q 
and by (7). This last inequality implies that q tends to +00 as q tends to +00. Finally, 


from (9), we get the asymptotic for g, namely 


conclude that x, satisfies 


f) q-1 


ME E a 


This completes the proof of Lemma 2. 
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Proof of Theorem 5 
We are now in a position to conclude the proof of Theorem 5. To do so, we begin by 
introducing 


I(x) H(coerfo(x)) = f (x) Liza + K locxs, 


where f(x) = LE" ag = erfc7! (2), and K =1. A brief calculation 


erfe(x)"+c9" Kio? co 
shows that G and J, and their corresponding Mellin transforms are related as follows 


1 1 
G(x) = J | —— d MG(s) = ——————— 11 
e (z x) dd 234 DIMJ(=s) P 


Thus, in terms of J, Eq. (3) becomes 


2°71 MI[p] (s/2) 
VD" MJ(<s) 


From the estimate for erfc at +00, given in the proof of Lemma2, the function J, 
is in L} for every k > —1. Thus MJ, is holomorphic on the right half-plane, see 
Proposition 1. Using Lemma3.2 in [1] on the vertical line LE +iR with ix > 0, 
one deduces that bounds for MJ(—s) amounts to estimate |s MJ; (s)|, from above 
or from below, on the vertical lines q + i R, for q > 0. The Mellin transform of Jı 
ats = q + it is given by 


Mp(s + 1) = (12) 


a +00 s +00 
MAG = K fitis et [ fexta - x IAE 
5 
0 a a 


For any a > 0,q > Oands € q +i R we have 
+00 
ad 
IMAGI = KS ES 
q 


which is finite. Let q > 0. According to Lemma2 the function x > f (x)x?^! 
is decreasing for x > xo. Let a < coerfc(xo) so that a = erfc7! (a /Co) > xo. Let 
g(x) = Fogar! I>a. For every te IR, [fox |, = 0 because f vanishes 
for x < o and xo < a, and g(x) = A y nq Mo oni +o geet), Then 
Corollary 1 can be applied to the function g, with A = a, B = +00, fors € q +i R, 
to give 
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oo 
S 
IsMJi(s)| < K |o? | +h ES [ro xl dx 


oo 
< Kat + cü max(l, af ron dx < oo, 


a 


s] 


Jug € [q, 1] U[1, q], eitherg < 1 or q > 1. For small values of a, the first 
term dominates the second one. The same calculation as above leads to 


because 


IsMJ,(s)| > Ko? — ch max(1, o f rr dx 


This latter expression is equivalent to K o/? as o tends to 4-co, therefore, it is positive 
for large values of o. 


4 Unstable Identifiability, Non Existence of Observability 
Inequalities 


Since the French-Groetsch model is also a Fredholm integral equation of the first 
kind, itis natural to apply a Mellin transform here too. This leads to interesting results: 
neither an observability inequality nor a proper numerical algorithm for recovering 
p can be established. However, an Identifiability result holds whenever the current 
is measured over an open time interval (see the Identifiability Theorem below). 


Defining G as 
E 1 
G(z) = P (o ete )) y 
"5 MD 


and rescaling time f in 1?, we obtain a convolution equation very similar to (3): 


1 Mio] (s/2) 


Mp(s +) — MG) 


(13) 


A close study of the transform of G (s) allows us to establish the following two 
theorems, which provide information about the behavior of the inverse problem 
associated with model (1). The proof of Theorems3 and 4 requires to extend Mellin 
transform to functions in the Schwartz space and to prove that the Mellin transforms 
of such smooth and rapidly decreasing functions decay faster than polynomials on 
vertical lines.! 


! The interested reader is referred to [1] for details on how this can be done and and for detailed 
proofs of Theorems 3 and 4. 
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Theorem 3 (Non Observability) Letr « 1 be fixed. For every non-negative integer 
k there exists no constant Cy > 0 such that the observability inequality: 


Kolo) | > Collollz2, 


L (00.29), ) 


holds for every function p € I? ([0, oo), x"). 
Note that this result shows that Ip € Dus L?. ,), and that if the inverse problem 
E 
were identificable (i.e., lp were injective), then Ip ! could not be continuous. 
Theorem 4 (Identifiability) Let r < O and p € L! ([0, oo), x^) be arbitrary. If there 
exists a nonempty open subset U of (0, oo) such that for all t € U, Ig[o](t) = 0, 
then p — 0 almost everywhere on (0, oo). 


The interested readeris referred to [1, $4 and $5] for various numerical experiences 
associated with the different theoretical results of this paper. In particular, Theorems 
5 and 6 are graphically illustrated in the quoted reference with data extracted from 
laboratory experiments carried out by Chen et al. [2] in the 1990s. 


A Path Forward 

The Mellin transform has been successful in mathematically analyzing models (1) 
and (2), allowing us to answer questions of existence (observability), uniqueness and 
identifiability of the distribution of ion channels along a cilium, as well as stability 
issues associated with both direct and inverse problems in these models. However, 
from a more holistic scientific point of view, not a purely mathematical one, the big 
question does not seem to be exactly this. Rather, it is about whether, by using and 
studying these models, Mathematics truly helps to improve our understanding of the 
olfactory system and, in general terms, the real world. In this sense, Kleene's exper- 
iments have been a great contribution, albeit insufficient. Much stronger validation 
of the models is required, which can only be achieved by forming multidisciplinary 
teams and designing ad-hoc experiments. 


Acknowledgements C. C. is partially supported by PFBasal-001 and AFBasal170001 projects, 
and from the Regional Program STIC-AmSud Project NEMBICA-20-STIC-05. 


References 


1. Bourgeron, T., Conca, C., Lecaros, R.: Determining the distribution of ion channels from 
experimental data. Math. Mod. Numer. Anal. (ESAIM: MAN) 52, 2083-2107 (2018) 

2. Chen, C., Nakamura, T., Koutalos, Y.: Cyclic AMP diffusion coefficient in frog olfactory cilia. 
Biophys. J. 76, 2861-2867 (1999) 

3. Conca, C., Lecaros, R., Ortega, J.H., Rosier, L.: Determination of the calcium channel distri- 
bution in the olfactory system. J. Inverse Ill Posed Probl. 22, 671—711 (2014) 

4. French, D.A., Flannery, R.J., Groetsch, C.W., Krantz, W.B., Kleene, S.J.: Numerical approxi- 
mation of solutions of a nonlinear inverse problem arising in olfaction experimentation. Math. 
Comput. Model. 43, 945-956 (2006) 


Modelling Our Sense of Smell 55 


5. Kleene, S.J.: Origin of the chloride current in olfactory transduction. Neuron 11, 123-132 
(1993) 

6. Kleene, S.J., Gesteland, R.C.: Transmembrane currents in frog olfactory cilia. J. Membr. Biol. 
120, 75-81 (1991) 

7. Lindelöf, E.: Robert Hjalmar Mellin. Acta Math. 61, i-vi (1933) 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons license and 
indicate if changes were made. 

The images or other third party material in this chapter are included in the chapter’s Creative 
Commons license, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter’s Creative Commons license and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


State Estimation—The Role of Reduced D 
Models ais 


updates 


Albert Cohen, Wolfgang Dahmen, and Ron DeVore 


Abstract The exploration of complex physical or technological processes usually 
requires exploiting available information from different sources: (1) physical laws 
often represented as a family of parameter dependent partial differential equations 
and (ii) data provided by measurement devices or sensors. The amount of sensors 
is typically limited and data acquisition may be expensive and in some cases even 
harmful. This article reviews some recent developments for this “small-data” scenario 
where inversion is strongly aggravated by the typically large parametric dimension- 
ality. The proposed concepts may be viewed as exploring alternatives to Bayesian 
inversionin favor of more deterministic accuracy quantification related to the required 
computational complexity. We discuss optimality criteria which delineate intrinsic 
information limits, and highlight the role of reduced models for developing efficient 
computational strategies. In particular, the need to adapt the reduced models—not 
to a specific (possibly noisy) data set but rather to the sensor system—is a central 
theme. This, in turn, is facilitated by exploiting geometric perspectives based on 
proper stable variational formulations of the continuous model. 


1 Introduction 


Modern sensor technology and data acquisition capabilities generate an ever increas- 
ing wealth of data about virtually every branch of science and social life. Machine 
learning offers novel techniques for extracting quantifiable information from such 
large data sets. While machine learning has already had a transformative impact on 
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a diversity of application areas in the “big-data” regime, particularly in image clas- 
sification and artificial intelligence, it is yet to have a similar impact in many other 
areas of science. 

Utilizing data observations in the analysis of scientific processes differs from tra- 
ditional learning in that one has the additional information that these processes are 
described by mathematical models—systems of partial differential equations (PDE) 
or integral equations—that encode the physical laws that govern the process. Such 
models, however, are often deficient, inaccurate, incomplete or need to be further cal- 
ibrated by determining a large number of parameters in order to accurately represent 
an observed process. Typical guiding examples are Darcy’s equation for the pressure 
in ground-water flow or electron impedance tomography. Both are based on second 
order elliptic equations as core models. The diffusion coefficients in these examples 
describe premeability or conductivity, respectively. The parametric representations 
of the coefficients could arise, for instance, from Karhunen-Loéve expansions of a 
random field that represent “unresolvable” features to be captured by the model. In 
this case the number of parameters could actually be infinite. 

The use of machine learning to describe complex states of interest or even the 
underlying laws, solely through data, seems to bear little hope. In fact, data acquisi- 
tion is often expensive or even harmful as in applications involving radiation. Thus, a 
severe undersampling poses principal obstructions to state or parameter estimation 
by solely processing observational data through standard machine learning tech- 
niques. It is therefore more natural to try to effectively combine the data information 
with the knowledge of the underlying physical laws represented by parameter depen- 
dent families of PDEs. 

Methods that fuse together data-driven and model-based approaches fall roughly 
into two categories. One prototype of a data assimilation scenario arises in meteorol- 
ogy where data are used to stabilize otherwise chaotic dynamical systems, typically 
with the aid of (stochastic) filtering techniques. A second setting, in line with the 
above examples, uses an underlying stable continuous model to regularize otherwise 
ill-posed estimation tasks in a “small-data” scenario. Bayesian inversion is a promi- 
nent way of regularizing such problems. It relaxes the estimation task to asking only 
for posterior probabilities of states or parameters to explain given observations. 

The present article reviews some recent developments on data driven state and 
parameter estimation that can be viewed as seeking alternatives to Bayesian inver- 
sion by placing a stronger focus on deterministic uncertainty quantification and its 
relation to computational complexity. The emphasis is on foundational aspects such 
as the optimality of algorithms (formulated in an appropriate sense) when treating 
estimation tasks for “small-data” problems in high-dimensional parameter regimes. 
Central issues concern the role of reduced modeling and the exploitation of intrinsic 
problem metrics provided by the variational formulation of the underlying con- 
tinuous family of PDEs. This is used by the so called Parametrized Background 
Data-Weak (PBDW) framework, introduced in [20] and further analyzed in [4], to 
identify a suitable trial (Hilbert) space U that accommodates the states and eventually 
also the data. An important point is to distinguish between the data and correspond- 
ing sensors—here linear functionals in the dual U’ of U—from which the data are 
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generated. This will be seen to actually open a geometric perspective that sheds light 
on intrinsic estimation limits. Moreover, in the deterministic setting, a pivotal role 
is played by the so called solution manifold, which is the set of all states that can be 
attained when the parameters in the PDE traverse the whole parameter domain. 

Even with full knowledge of a state in the solution manifold, to infer from it a 
corresponding parameter is a nonlinear severely ill-posed problem typically formu- 
lated as a non-convex optimization problem. On the other hand, state estimation from 
data is a linear, and hence a more benign inversion task mainly suffering under the 
current premises from a severe undersampling. We will, however, indicate how to 
reduce, under certain circumstances, the latter to the former problem so as to end up 
with a convex optimization problem. This motivates focusing in what follows mainly 
on state estimation. A central question then becomes how to best invoke knowledge 
on the solution manifold to regularize the estimation problem without introducing 
unnecessarily ambiguous bias. Our principal viewpoint is to recast state estimation 
as an optimal recovery problem which then naturally leads one to explore the role 
and potential of reduced modeling. 

The layout of the paper is as follows. Section2 describes the conceptual frame- 
work for state estimation as an optimal recovery task. This formulation allows the 
identification of lower bounds for the best achievable recovery accuracy. 

Section3 reviews recent developments concerning a certain affine recovery 
scheme and highlights the role of reduced models adapted to the recovery task. 
The overarching theme is to establish certified recovery bounds. When striving for 
optimality of such affine recovery maps, high parameter dimensionality is identified 
as a major challenge. We outline a recent remedy that avoids the Curse of Dimen- 
sionality by trading deterministic accuracy guarantees against analogs that hold with 
quantifiable high probability. 

Even optimal affine reduced models can, in general, not be expected to realize the 
benchmarks identified in Sect. 2. To put the results in Sect. 3 in proper perspective, we 
comment in Sect.4 on ongoing work that uses the results on affine reduced models 
and corresponding estimators as a central building block for nonlinear estimators. 
We also indicate briefly some ramifications on parameter estimation. 


2 Models and Data 


2.1 The Model 


Technological design or simulating physical processes is often based on continuum 
models given by a family 
Ru,y)=0, yey, (2.1) 


of partial differential Equations (PDEs) that depend on parameters y ranging over 
a parameter domain Y C IR^. We will always assume uniform well-posedness of 
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(2.1): for each y € Y, there exists a unique solution u = u(y) in some trial Hilbert 
space U which satisfies R(u(y), y) = 0. 
Specifically, we consider only linear problems of the form 8,u = f , that is, 


Riu, y) =f — Byu. (2.2) 


Here f belongs to the dual V’ of a suitable test space V and $, is a linear operator 
acting from U to V’ that depends on y € Y. Here, uniform well-posedness means 
then that B, is boundedly invertible with bounds independent of y. By the Babuška- 
Banach-Necas Theorem, this is equivalent to saying that the bilinear form 


(u, v) > by(u, v) := (Byu)(v) (2.3) 


satisfies the following continuity and inf-sup conditions 


b,(u, v b,(u, v 
LN <C, and infsu Py) >cp>0, yey, (2.4) 
ueU vev llullullvlly u€U yey lIullullvilv 


together with the property that b, (u, v) = 0, u € U, implies v = 0 (injectivity of BY). 
The relevance of this stability notion lies in the entailed validity of the error-residual 
relation 


C; If — 8,vllv € luO) - vlu S c lf — S,vlv, veU, yey, Q5 


where ||2 || := sup{g(v) : Ivlly = 1}. Thus, errors in the trial norm are equivalent 
to residuals in the dual test norm which will be exploited in what follows. 

For a wide range of problems such as space-time variational formulations, e.g. 
of parabolic or convection-diffusion problems, indefinite or singularly perturbed 
problems, the identification of a suitable pair U, V that guarantees stability in the 
above sense is not entirely straightforward. In particular, trial and test space may 
have to differ from each other, see e.g. [6, 11, 17, 23] for examples as well as some 
general principles. 

The simplest example, used for illustration purposes, is the elliptic family 


Ru, y) =f + div (a) Vu), Q.6) 
set in Q C R^ where d, € (1, 2, 3}, with boundary conditions ul) = 0. Uniform 


well-posedness follows then for U = V = HÈ ($2) if we have for some fixed constants 
0 <r<R «oo the bounds 


rsa(x,y)<R, @yeQxy, (2.7) 
readily implying (2.4). 


Aside from well-posedness, a second important structural property of the model 
(2.1) is affine parameter dependence. By this we mean that 
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dy 
Byu = Bou + yj, y = pio, € Y, (2.8) 


j=l 


where the operators B; : U — V’ are independent of y. In turn, the residual has a 
similar affine dependence structure 


dy 
Ru, y) = Rou) + Ra, Rolu) =f — Bou, Rj = —B;. (2.9) 


j=l 


For the example (2.6) such a structure is encountered for affine parametric represen- 
tations of the diffusion coefficients 


d, 
a(x, y) = ao(x) + O (x, y) € Q x Y, (2.10) 


j=l 


i.e., the field a is expanded in terms of some given spatial basis functions 6;. As 
indicated earlier, the pressure equation in Darcy’s law for porous media flow is 
an example for (2.6) where the diffusion coefficient a(y) of the form (2.10) may 
arise from a stochastic model for permeability via a Karhunen-Loéve expansion. In 
this case (upon proper normalization) y € [—1, 1] has, in principle, infinitely many 
entries, that is d, = 00. However, due to (2.7), the 6; should then have some decay 
as j — oo which means that the parameters become less and less important when j 
increases. Another example is electron impedance tomography involving the same 
type of elliptic operator where parametric expansions represent possible variations of 
conductivity often modeled as piecewise constants, i.e., the 6; could be characteristic 
functions subordinate to a partition of Q. In this case data are acquired through 
sensors that act through trace functionals greatly adding to ill-posedness. 
A central role in the subsequent discussion is played by the solution manifold 


M=u(Y) := (u(y) : y e Y) (2.11) 


which is then the range of the parameter-to-solution map u : y > u(y) comprised of 
all states that can be attained when y traverses Y. Without further mention, M will 
be assumed to be compact which actually follows under standard assumptions met 
in all above mentioned examples. 

Estimating states in M or corresponding parameters from measurements requires 
the efficient approximation of elements in M. A common challenge encountered 
in all such models lies in the inherent high-dimensionality of the states u = u(-, y) 
as functions of d, spatial variables x € Q and d, > 1 parametric variables y € Y. 
In particular, when d, = 00 any calculation, of course, has to work with finitely 
many “activated” parameters whose number, however, has to be coordinated with the 
spatial resolution of a numerical scheme to retain model-consistency. It is especially 
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this issue that hinders standard approaches based on first discretizing the parametric 
model because rigorously balancing spatial and parametric uncertainties becomes 
then difficult. 

What renders such problem scenarios nevertheless numerically tractable is a fur- 
ther property that will be implicitly assumed in what follows, namely that the Kol- 
mogorov n-widths of the solution manifold 


d,(M)y := inf sup inf lu — vllo (2.12) 


dim U,—n ye M VEUn 
exhibits at least some algebraic decay 
d,M)u Sin? (2.13) 


for some s > 0, see [13] for a comprehensive account. 

For instance, this is known to be the case for elliptic models (2.6) with (2.7), as 
a consequence of the results of sparse polynomial approximation of the parameter 
to solution map y > u(y) established e.g. in [15]. More generally, (2.13) can be 
established under a general holomorphy property of the parameter to solution map, 
as a consequence of a similar algebraic decay assumed on the n-widths of the param- 
eter set, see [14]. For a fixed finite number d, < oo of parameters, under certain 
structural assumptions on the parameter representations (e.g. piecewise constants on 
checkerboard partitions) one can even establish (sub-) exponential decay rates, see 
[2] for more details. Assuming s in (2.13) to have a “substantial” size for any range 
of d,, is therefore justified. 

In summary, the results discussed below are valid and practically feasible for well 
posed linear models (2.4) with affine parameter dependence (2.9) whose solution 
manifolds have rapidly decaying n-widths (2.13). 


2.2 The Data 


Suppose we are given data w = (w1, ..., Wm)! € IR" representing observations of 
an unknown state u € U obtained through m linearly independent linear functionals 
£e U’, 18: 

wi —£j(u) i-l,...,m. (2.14) 


Since in real applications data acquisition may be costly or harmful we assume 
that m is fixed. The central task to be discussed in what follows is to recover from 
this information an estimate for the observed unknown state u, based on the prior 
assumption that u belongs to M or is close to M. Moreover, to bring out the essence 
of this estimation task we assume for the moment that the data are noise-free. 

Following [4, 20], we first recast the data in a “compliant” metric, by introducing 
the Riesz representers y; € U, defined by 
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(Vi vuy = 6), veU, i=l,...,m, 


The Y; now span the m-dimensional subspace W C U which we refer to as mea- 
surement space, and the information carried by the £;(u) is equivalent to that of the 
orthogonal projection Pyyu of u to W. The decomposition 


u=Pwut+Pw.iu, ueU, (2.15) 


thus contains a first term that is “seen” by the sensors and a second (infinite- 
dimensional) term which cannot be detected. The decomposition (2.15) may be seen 
as a sensor-induced “coordinate system” thereby opening up a geometric perspective 
that will prove very useful in what follows. State estimation can then be viewed as 
learning from samples w :— Pwu the unknown “labels” Pyu € W+. 

In this article, we are interested in how well we can approximate u from the 
information that u € M and Pwu = w with w given to us. Any such approximation 
is given by a mapping A : w > A(w) € U. The overall performance of recovery on 
all of M by the mapping A is typically measured in the worst case setting, that is, 


Enc(A, M, W) = sup ||u — A(Pwu)|lv. (2.16) 


uc 


The optimal recovery error on M is then defined as 
Es (M, W) := inf Ew (A, M, W), Q.17) 


where the infimum is over all possible recovery maps. Let us observe that the con- 
struction of recovery maps can be restricted to be of the form 


A:w— A(w), A(w) 2w--B(w), with B: W > W-. (2.18) 
Indeed, given any recovery mapping A, we can write A(w) = PwA(w) + Pw A(w) 
and the performance of the recovery can only be improved if we replace the first 
term by w. In other words, A(w) should belong to the affine space 


Uy := w + WHE, (2.19) 


that contains u. The mappings B are commonly referred to as liftings into W+. 


2.3 Optimality Criteria and Numerical Recovery 


Finding a best recovery map A attaining (2.17) is known as optimal recovery. The 
best mapping has a well-known simple theoretical description, see e.g. [21], that we 
now describe. Note first that a precise recovery of the unknown state u from the given 
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information is generally impossible. Indeed, the best we can say about u is that it lies 
in the manifold slice 


MG, := {ue M : Pyu = w} = MN Uy, (2.20) 


which is comprised of all elements in M sharing the same measurement w € W. The 
Chebyshev ball B(M,,) is the smallest ball in U that contains Mw. The best recovery 
algorithm is then given by the mapping 


A*(w) := cen(M,,), (2.21) 


that assigns to each w € M the center cen(M,,) of B(M,,), called the Chebyshev 
center of M,,. Then, the radius rad(M,,) of B(M,,) is the best worst case error over 
the class M,,. The best worst case error over M, which is achieved by A”, is thus 
given by 

Ewe(M, W) = E«((*, M, W) = max rad(M,,). (2.22) 


While the above mapping A* gives a nice theoretical description of the optimal 
recovery algorithm, it is typically not numerically implementable since the Cheby- 
shev center cen(M,,) is not easily found. Moreover, such an optimal algorithm is 
highly nonlinear and possibly discontinuous. The purpose of this section is to for- 
mulate a more modest goal for the performance of a recovery algorithm with the 
hope that this more modest goal can be met with a numerically realizable algorithm. 
The remaining sections of the paper introduce numerically implementable recovery 
mappings, analyze their performance, and evaluate the numerical cost in constructing 
these mappings. 

The search for a numerically realizable algorithm must out of necessity lessen the 
performance criteria. A first possibility is to weaken the performance criteria to near 
best algorithms. This means that we search for an algorithm A such that 


Ey (A, M, W) < CoEwc (M, W), (2.23) 


with a reasonable value of Cy > 1. For example, any mapping A which takes w into 
an element in the Chebyshev ball of M,, is near best with constant Cy = 2. However, 
finding near best mappings A also seems to be numerically out of reach. 

In order to formulate a more attainable performance criterion, we return to our 
earlier observations about uncertainty in both the model class M and in the measure- 
ments w. The former is a modeling error while the latter is an inherent measurement 
error. Both of these uncertainties can be quantified by introducing for each ¢ > 0, 
the e-neighborhood of the manifold 


Mf := (v € U : dist (v, My < e). (2.24) 
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The uncertainty in the model can be thought of as saying the sought after u is in M* 
rather than u € M. Also, we may formulate uncertainty (noise) in the measurements 
as saying that they are not measurements of au € M but rather some u € M*. Here 
the value of e quantifies these uncertainties. 

Our new goal is to numerically construct a recovery map A that is near-optimal 
on M°, for some given e > 0. Let us note that M* is not compact. An algorithm A 
is worst-case near optimal for MP if and only if its performance is bounded by a 
constant multiple of the diameter 


8. (ML, W) := max (lu — vllu : u, v € MÊ, Pw(u — v) = 0}. (2.25) 


Notice that e = 0 gives the performance criterion for near optimal recovery over M. 
One can show that the function € — ó, (V, W) is monotone non-decreasing in e, 
continuous from the right, and lim, ,9« 6; (M, W) = 69(M, W). The speed at which 
6; CM, W) approaches 59(M, W) reflects the “condition” of the estimation problem 
depending on M and W. While the practical realization of worst-case near-optimality 
for MP is already a challenge, quantifying corresponding computational cost would 
require assumptions on the condition of the problem. 

One central theme, guiding subsequent discussions, is therefore to find recovery 
maps A, that realize an error bound of the form 


Eve (As, M, W) < Cod; (M, W). (2.26) 


Any a priori information on measurement accuracy and model bias might be used to 
choose a viable tolerance e. 

High parametric dimensionality poses particular challenges to estimation tasks 
when the targeted error bounds are in the above worst case sense. These challenges 
can be somewhat mitigated when adopting a Bayesian point of view [24]. The prior 
information on u is then described by a probability distribution p on U, which is 
supported on M. Such a measure is typically induced by a probability distribution 
on Y that may or may not be known. In the latter case, sampling M, i.e., com- 
puting snapshots u(y), i = 1,..., N, for iid. samples y' € Y, provides labeled 
data (w;, wr) = (Pwu(y'), Pw.u(y')) according to the sensor-based decomposition 
(2.15). This puts us into the setting of regression in machine learning asking for an 
estimator that predicts for any new measurement w € W its lifting wt = B(w). It is 
then natural to measure the performance of an algorithm in an averaged sense. The 
best estimator A that minimizes the mean-square risk 


Ems(A, p, W) = E(lu — A(Pwu) ||?) = f lu — AG wu dp(u) (2.27) 
U 


is given by the conditional expectation 


A(w) = E(u|Pyyu = w). (2.28) 
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Since always Ems(A, p, W) < Eyc(A, M, W), the optimality benchmarks are some- 
what weaker. In the rest of this paper, we adhere to the worst case error in the 
deterministic setting that only assumes membership of u to M or ME. 

The following section is concerned with an important building block on a path- 
way towards achieving (2.26) at quantifiable computational cost. This building block, 
referred to as one-space method is a linear (affine) scheme which is, in principle, sim- 
ple and easy to numerically implement. It depends on suitably chosen subspaces. We 
highlight the regularizing property of these subspaces as well as ways to optimize 
them. This will reveal certain intrinsic obstructions caused by parameter dimen- 
sionality. The one-space method by itself will generally not achieve (2.26) but, as 
indicated earlier, can be used as a building block in a nonlinear recovery scheme that 
may indeed meet the goal (2.26). 


3 The One-Space Method 


3.1 Subspace Regularization 


The one space method can be viewed as a simple regularizer for state estimation. 
The resulting recovery map is induced by an n-dimensional subspace U,, of U for 
n < m. Assume that, for each n > 0, we are given a subspace U, C U of dimension 
n whose distance from M can be assessed 


dist(M, U,)u := max dist(u, U„)u < €n. (3.1) 
ue 


Then the cylinder 
K (Un, €n) := {u € U : dist(u, U;)u < En) (3.2) 


contains M and likewise the cylinder X(U,,, €n + €) contains M*. Our prior assump- 
tion that the observed state belongs to M or M can then be relaxed by assuming 
membership to these larger but simpler sets. 

Remarkably, one can now realize an optimal recovery map quite easily that meets 
the relaxed benchmark Ey (7X (U,, £n), W): in [4] it was shown that the Chebyshev 
center of the slice 

Kw (Un, En) = 'K(U,., En) A Uw, (3.3) 


is exactly given by the state in U,, that is closest to U,, that is 


u* = u*(w) := argmin ||u — Py,ullu. (3.4) 
uceU,, 
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This minimizer exists and can be shown to be unique as long as U, N W- = (0). 
The corresponding optimal recovery map 


Ay, : w e u*(w) (3.5) 


was first introduced in [20] as the Parametrized Background Data Weak (PBDW) 
algorithm, and is referred to as the one-space method in [4]. Due to its above mini- 
mizing property, it is readily checked that this map is linear and can be determined 
with the aid of the singular value decomposition of the cross-Gramian between any 
pair of orthonormal basis for U,, and W. 

The worst case error Ewe (K (Un, €n), W) can be described more precisely by 
introducing 


Ivllu 
A QU,, W) := sup 
veu, Pwvllu 


(3.6) 


which is finite if and only if U, N W+ = {0}. This quantity, also introduced in a 
related but slightly different context in [1], is therefore related to the angle between 
the spaces U,, and W. It becomes large when U,, contains elements that are nearly 
perpendicular to W. It is actually computable: one has w(U,, W) = 8(U,, W)”! 
where 


Bn, W) :— inf sup (v, w)u 


— (3.7) 
veU, wew [lvllullwilu 


and £(U,,, W) is the smallest singular value of the cross-Gramian between any pair 
of orthonormal bases for W and U,,. It has been shown in [4, 20] that the worst case 
error bound over K(U,, e, ) is given by 


Ewc(Ay, > 'K(U,, En), W) = Ey AK(U En), W) = w(Un, When. (3.8) 
The quantity (U,,, W) also coincides with the norm of the linear recovery map Ay, . 
Relaxing the prior u € M by exploiting information on M solely through approx- 
imability of M by U,, thus implicitly regularizes the estimation task: whenever 
u(U,, W) is finite, the optimal recovery map Ay, is bounded and hence Lipschitz. 
One important observation is that the map Ay, is actually independent of e,.. In 
particular, it achieves optimality for the smallest possible containment cylinder 
KU,) :— K(U,, dist(M, U,)u), (3.9) 
and therefore, since Eyc(Au,. M, W) < Eyc(Au,, K(Un), W) = Ewe (K (U,n), W), 
Ewc(Au,, M, W) < u (Un, W)dist (M, U,)y. (3.10) 
Likewise, the containment M^ C K(U,,, dist (M, U„)u + £) implies that 


Ewc(Au,, MU, W) < u(U,, W) (dist (M, Un)u + 8). (3.11) 
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On the other hand, the recovery map Ay, may be far from optimal over the sets M 
or M°. This is due to the fact that the cylinders K(U,,, €n) and K (Un, En + €) may 
be much larger than M or M°. In particular, it is quite possible that for a particular 
observation w, one has rad(M,,) << rad(Ky (Un, €n)). Therefore, we cannot generally 
expect that the one space method achieves our goal (2.26). In particular, the condition 
n < m, which is necessary to avoid that 4 (U,,, W) = oo, limits the dimension of 
an approximating subspace U, and therefore e, itself is inherently bounded from 
below. The “dimension budget” has therefore to be used wisely in order to obtain 
good performance bounds. This typically rules out “generic approximation spaces” 
such as finite element spaces, and raises the question which subspace U,, yields the 
best estimator when applying the above method. 


3.2 Optimal Affine Recovery 


The results of the previous section bring forward the question as to what is the best 
choice of the space U,, for the given M. On the one hand, proximity to M is desir- 
able since dist (M, U,,)y enters the error bound. However, favoring proximity, may 
increase 4(U,, W). Before addressing this question systematically, it is important 
to note that the above results carry over verbatim when U, is replaced by an affine 
space U, = u + Ù, where Un C U is a linear space. This means the reduced model 
K (Un, £n) is of the form 


KUn, En) := u + K (Un, En). 
The best worst-case recovery bound is now given by 
Es (K(U,, £n), W) = (Ús, Wen. (3.12) 


Intuitively, this may help to better control the angle between W and U,, by anchoring 
the affine space at a suitable location (typically near or on M). More importantly, 
it helps in localizing models via parameter domain decompositions that will be 
discussed later. 

The one-space algorithm discussed in the previous section confines the “dimen- 
sionality” budget of the approximation spaces U, to n < m. In view of (3.10), to 
obtain an overall good estimation accuracy, this space can clearly not be chosen 
arbitrarily but should be well adapted both to the solution manifold M and to mea- 
surement space W, that is, to the given observation functionals giving rise to the 
data. 

A simple way of adapting a recovery space to W is as follows: suppose for 
a moment that we were able to construct for n = 1,...,m, a hierarchy of spaces 
Ur? c UR c ... CU", that approximate M in a near-best way, namely 


m»? 


dist (M, U®)y < Cd, (Mv. (3.13) 
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We may compute along the way the quantities (U**, W), then choose 


n* = argmin (U", W)dist (M, U™)y, (3.14) 


n<m 


and take the map Aj». We sometimes refer to this choice as "poor man’s algorithm". 


It is not clear though whether U®? is indeed a near-best choice for state recovery by 
the one-space method. In other words, one may question whether 


Ey (Aye, M, W) <C inf. E (Ag, M, W), (3.15) 


dimU<m 


holds with a uniform constant C < oo. In fact, numerical tests strongly suggest other- 
wise, which motivated in [12] the following alternative to the poor man’s algorithm. 

Recall that a given linear space U,, determines the linear recovery map Ay, . Like- 
wise a given affine space U,, determines an affine recovery map Ay,. Conversely, 
it can be checked that an affine recovery map A determines an affine space U,, that 
allows one to interpret the recovery scheme as a one-space method in the sense that 
A = Ay,. Denoting by A the class of all affine mappings of the form 


A(w) =w +z + Bw, (3.16) 


where z € W- and B € £(W, W+) is linear, we might thus as well directly look for 
a mapping that minimizes 


Ewc(A, M, W) := sup ||u — A(Pwu)llu = sup |Pw:u — z — BPwully =: 6. B) 
ueM ue M 
(3.17) 


over A, i.e., over all (z, B) € Wt x £(W, W+). It can be shown that indeed a min- 
imizing pair (z*, B*) exists, i.e., 


e(z, B*) = min Ewc(A, M, W) =: we, ACM, W), 


see [12]. However, the minimization of Ewe (A, M, W) over (z, B) € W+ x L(W, WŁ) 
is far from practically feasible. In fact, each evaluation of Ewe(A, M, W) requires 
exploring M and B can have a range in the infinite dimensional space W+. In order 
to arrive at a computationally tractable problem, one needs to 


(i) Replace M by a finite set Mc M, that should be sufficiently dense. Denseness 
can be quantified by requiring that M = M? is a 8-net for M for some ô > 0, 
i.e., for any u € M, there exists ú € MÊ such that ||u — úlly < ô. 

Gi) Choose a finite dimensional space Uz C U that approximates M to a desired 
precision dist (M, Uz)y < 5, and replace W+ by the finite dimensional comple- 
ment 
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Wt := U, O W (3.18) 


of W in U;. 


The resulting optimization problem 


Z, B) = argmin sup ||Pwiu — z — BPwullv. (3.19) 
(z,B)e W^ x L(W,W-) ue No 


can be solved by primal-dual splitting methods providing a O(1/k) convergence rate, 
[12]. 

Due to the perturbations (i) and (11) ofthe ideal minimization problem, the resulting 
(Z, B) is no longer optimal. However, one can show that 


Eyc(A, M, W) € Ewe,a(M, W) + n + C8, (3.20) 


where the constant C is the operator norm of B minimizing (3.17). On the other hand, 
since the range of any affine mapping A is an affine space of dimension at most m, 
therefore contained in a linear space of dimension at most m+ 1, one always has 
Ewe. a(M, W) > dingi(M)u. Therefore (Z, B) satisfies a near-optimal bound 


Ev. (À, M, W) < Ewea(M, W), (3.21) 
whenever y and ô are picked such that 
n = dm+1ı(M)u, and ô D ding Mv. (3.22) 


The numerical tests in [12] for a model problem of the type (2.6) with piecewise 
constant checkerboard diffusion coefficients and d, up to d, = 64 show that this 
recovery map exhibits significantly better accuracy than the method based on (3.14). 
It even yields smaller error bounds than the affine mean square estimator (2.27). The 
following section discusses the numerical cost entailed by conditions like (3.22). 


3.3 Rate-Optimal Reduced Bases 


To keep the dimension L of the space Uz in (3.18) small, a near-best subspace Un 
in the sense of (3.13) would be highly desirable. Likewise the poor man’s scheme 
(3.14) would benefit from such subspaces. Unfortunately, such near-best subspaces 
are not practically accessible. The reduced basis method aims to construct subspaces 
which come close to near-optimality in a sense that we further explain next. The 
main idea is to generate theses subspaces by a sequence of elements picked from 
the manifold M itself, by means of a weak-greedy algorithm introduced and studied 
in [8]. In an idealized form, this algorithm proceeds as follows: given a current 
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space U} = spanf[u;,..., Un}, one takes u,.+1 = U(y»+1) such that, for some fixed 
y €]0, 11, lune = Py, Un-+1 lu > y maxyem llu — Pu, ullu, or equivalently 


lu nai) — Pu, “Ons )llu > y max luo) — Pu,uQ)llu- (3.23) 
Then, one defines uU = span{u, ..., u541). While unfortunately, the weak greedy 


algorithm does in general not produce spaces satisfying (3.13), it does come close. 
Namely, it has been shown in [3, 19] that the spaces U;® are rate-optimal in the 
following sense: 


(i) For any s > 0 one has 


d,(M)y € C(n+ 1)?, nz 0 => dist (M, U"S)y < C(n 4- 1)”, n > 0, 
(3.24) 
where C depends on C, s, y. 
(11) For any £ > 0, one has 


d,(M)u < Ce, n> 0 => dist (M, U"2)y < Ce", n» 0, (3.25) 


where the constants C, C depend on c, C, B, y. 


In the form described above, the weak-greedy concept seems infeasible since it 
would, in principle, require computing the solution u(y) for all values of y € Y 
exactly, exploring the whole exact solution manifold. However, its practical applica- 
bility is facilitated when there exists a tight surrogate R(y, U,), satisfying 


cgR(y, Un) < luy) — Pu, 40) llu = dist (UO), Un) < CrRO, Un), ye V, 
(3.26) 
for uniform constants 0 < cg < Cg < co, which can be evaluated at affordable cost. 
Then, maximization of R(y, U,) over Y amounts to the weak-greedy step (3.23) 
with y :— Ce According to [18], the validity of the following two conditions indeed 
allows one to derive computable surrogates that satisfy (3.26): 


(1) The underlying parametric family of PDEs (2.1) permits a uniformly stable 
variational formulation (2.4), and one has affine parameter dependence (2.9); 

(ii) The discrete projection My, (of Galerkin or Petrov-Galerkin type) has the best 
approximation property, 1.e., resulting errors are uniformly comparable to the 
best approximation error. 


Conditions (i) and (ii) ensure, in view of (2.5), that ||u(y) — Pu,uQ)|lu ~ IRO, 
Tv, u(y)) ||. holds uniformly in y € Y. Thus, 


RG, I 
RO, Up) = IRO, Flu, (liv = sup DTO 


(3.27) 
veV I V lv 


satisfies (3.26) and is therefore a tight surrogate for dist (M, U,,)y. In the elliptic case 
(2.6) under assumption (2.7), (1) and (11) hold and the above comments reflect standard 
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practice. For the wider scope of stable but unsymmetric variational formulations [6, 
16, 23] the inf-sup conditions (2.4) imply (1), but the Galerkin projection in (ii) 
needs to be replaced by a stable Petrov-Galerkin projection with respect to suitable 
test spaces V, accompanying the reduced trial spaces U,,. It has been shown in [18] 
how to generate such test spaces with the aid of a double-greedy strategy, see also 
[16]. 

The main pay-off of using the surrogate R(y, U,,) is that one no longer needs to 
compute u(y) but only the low-dimensional projection ly, u(y) by solving for each y 
ann x n system, which itself can be rapidly assembled thanks to the affine parameter 
dependence [22]. However, one still faces the problem of its exact maximization 
over y € V. A standard approach is to maximize instead over a discrete training set 
Y,, C Y, which in turn induces a discretization of the solution manifold 


M, = UO) : y e Yn). (3.28) 


The resulting weak-greedy algorithm can be shown to remain rate optimal in the 
sense of (3.24) and (3.25) if the discretization is fine enough so that M, constitutes 
an €,-approximation net of M where e, does not exceed cdist (M, U}®)u for a 
suitable constant 0 < c < 1. In the current regime of large or even infinite parameter 
dimensionality, this becomes prohibitive because #Y,, would then typically scale like 
O (e, ^^), [10]. 

As aremedy it has been proposed in [10] to use training sets y n that are generated 
by randomly sampling Y, and ask that the objective of rate optimality is met with high 
probability. This turns out to be achievable with training sets of much less prohibitive 
size. In an informal and simplified manner the main result can be stated as follows. 


Theorem 1 Given any target accuracy e > 0 and some 0 < y < 1, then the weak 
greedy reduced basis algorithm based on choosing at each step N = N (e,n) ~ 
|Inn| + | In e] randomly chosen training points in Y has the following properties 
with probability at least 1 — : it terminates with dist (M, Une )u < e as soon as 
the maximum of the surrogate over the current training set falls below ce*** for 
some c,a > 0. Moreover, if d,(M)y x Cr, then n(e) < e`, The constants 
c, a, b depend on the constants in (3.26), as well as on the rate r of polynomial 
approximability of the parameter to solution map y +> u(y). The larger s and r, the 
smaller a and b, and the closer the performance becomes to the ideal one. 


4 Nonlinear Models 


4.1 Piecewise Affine Reduced Models 


As already noted, schemes based on linear or affine reduced models of the form 
K(U,,, e) can, in general, not be expected to realize the benchmark (2.26), discussed 
earlier in Sect.2. The convexity of the containment set K(U,, e) may cause the 
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reconstruction error to be significantly larger than ôs (M, W). Another way of under- 
standing this limitation is that in order to make £ small, one is enforced to raise the 
dimension n of U,,, making the quantity u(U„, W) larger and eventually infinite if 
nm. 

To overcome this principal limitation one needs to resort to nonlinear models 
that better capture the non-convex geometry of M. One natural approach consists in 
replacing the single space U, by a family (U*),—;.....x of affine spaces 


Uk = Uk + U*, dim(U*) =n <M, (4.1) 


each of which aims to approximate a portion My of M to a prescribed target accuracy 
simultaneously controlling (U*, W): fixing e > 0, we assume that we have at hand 


a partition of M into portions 
K 


M=\|JM (4.2) 


k=1 


such that 
dist (M, Uy < &, and uÙ, W)e <e, k=1,...,K. (43) 


One way of obtaining such a partition is through a greedy splitting procedure of the 
domain Y = [—1, 1]^ which is detailed in [9]. The procedure terminates when for 
each cell Y, the corresponding portion of the manifold Mg can be associated to 
an affine U; satisfying these properties. We are ensured that this eventually occurs 
since for a sufficiently fine cell Y, one has rad(M;) < e which means that we 
could then use a zero dimensional affine space Ug = {ug} for which we know that 
p OF, W) = 1. In this piecewise affine model, the containment property is now 


K 
M c (JKU, ex). (4.4) 


k=1 


and the cardinality K of the partition depends on the prescribed e. 
For a given measurement w € W, we may now compute the state estimates 


uj(w) = Aw (w), k=1,...,K, (4.5) 


by the affine variant of the one-space method from (3.4). Since u € My, for some 
value Ko, we are ensured that 


lu — uz, Qw)llu < e, (4.6) 


for this particular choice. However kg is unknown to us and one has to rely on the 
data w in order to decide which one among the affine models is most appropriate for 
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the recovery. One natural model selection criterion can be derived if for any u € U 
we have at our disposal a computable surrogate S (u) that is equivalent to the distance 
from u to M, that is 


cS (u) < dist (u, M)y < CS(u), dist (u, M)u = min lu — u(y) lv, (4.7) 
ye 


for some fixed 0 < c < C. We give an instance of such a computable surrogate in 
Sect.4.2 below. The selection criterion then consists in picking k* minimizing this 
surrogate between the different available state estimates, that is, 


u” (w) := uj. (w) = argmin (S(uz(w)) : k = 1,..., K}. (4.8) 


The following result, established in [9], shows that this estimator now realizes the 
benchmark (2.26) up to a multiplication of e by x := C/c, where c, C are the con- 
stants from (4.7). 


Theorem 2 Assume that (4.2) and (4.3) hold. For any u € M, if w = Pwu, one has 
llu — u” (w)|| < 82. (M, W), (4.9) 


where ôs (M, W) is given by (2.25). 


4.2 Approximate Metric Projection and Parameter 
Estimation 


A practically affordable realization of the surrogate S (u), providing a near-metric 
projection distance to M, is a key ingredient of the above nonlinear recovery scheme. 
Since it has further useful implications we add a few comments on that matter. 

As already observed in (2.5), whenever (2.1) admits a stable variational formula- 
tion with respect to a suitable pair (U, V) of trial and test spaces, the distance of any 
u € U to any u(y) € Mis uniformly equivalent to the residual of the PDE in V’ 


cif y)llw < luO) — ullu € CIRG, y)Ilv, (4.10) 
with c = C; !, C = c;! from (2.5). Assume in addition that R(u, y) depends affinely 
on y € Y, according to (2.9). Then, minimizing ||f(u, y)||y over y is equivalent to 


solving a constrained least squares problem 


y = argmin [lg — Myll2, (4.11) 
yey 


where M is a matrix of size d, x d, resulting from Riesz-lifts of the functionals 


R; (it). 
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The solution to this problem therefore satisfies 


lu — wu Sk n lu — uy) llu = «dist (u, My. (4.12) 
ye 


where x = C/c = C,/c+ is the quotient between the equivalence constants in (4.10). 
The surrogate 
S(u) :— ||R@, y)llw (4.13) 


for the metric projection distance of y onto M obviously satisfies (4.7). It is indeed 
computable at affordable cost using (an approximation to) its Riesz-lifted version 
leu, y)|ly = ||R@, y) |v Gn Ya C V) assembled from the Riesz-lifts of the compo- 
nents R;(u), see [9] for details in the affine expansion (2.9). 

Since solving the above problem provides an admissible parameter value y € Y, 
this also has some immediate bearing on parameter estimation. Suppose we wish to 
estimate from w = Pyyu(y*) the unknown parameter y* € Y. Assume further that A 
is any given linear or nonlinear recovery map. Computing along the above lines 


Yw = argmin IR(A (w), yllv 


yey 
we have 


[105 — 46 wllu < In^) — Alu + AG) — uw) lu 
< Ew (A, M, W) + dist (AQ), Mu < (1 + )Ew(A, M, W). (4.14) 
We consider now the specific elliptic model (2.6) with affine diffusion coefficients 
a(y) given by (2.10). For this model, it was established in [5] that for strictly positive 
f and certain regularity assumptions on a(y) as functions of x € Q, parameters may 


be estimated by states. Specifically, when a(y) € H! (Q) uniformly in y € Y, one 
has an inverse stability estimate of the form 


lao) alli) € Cl) UE (4.15) 


Thus, whenever the recovery map A satisfies (4.9) for some prescribed e > 0, we 
obtain a parameter estimation bound of the form 


lay) — ala < Cdce(M, W) "5, 


Note that when the basis functions 6; are L;-orthogonal, ||a(y*) — a(yw)llz,(2) is 
equivalent to a (weighted) £2 norm of y* — Yw. 
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4.3 Concluding Remarks 


The affine or piecewise affine recovery scheme hinges on the ability to approximate 
a solution manifold effectively by linear or affine spaces, globally or locally. As 
explained earlier this is true for problems of elliptic or parabolic type that may include 
convective terms as long as they are dominated by diffusion. This may however no 
longer be the case when dealing with pure transport equations or models involving 
strongly dominating convection. 

An interesting alternative would then be to adopt a stochastic model according 
to (2.27) and (2.28) that allows one to view the construction of the recovery map as 
a regression problem. In particular, when dealing with transport models, a natural 
candidate for parametrizing a reduced model are deep neural networks. However, 
properly adapting the architecture, regularization and training principles pose wide 
open questions addressed in current work in progress. 
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Leah Edelstein-Keshet 


Abstract While most of our tissues appear static, in fact, cell motion comprises 
an important facet of all life forms, whether in single or multicellular organisms. 
Amoeboid cells navigate their environment seeking nutrients, whereas collectively, 
streams of cells move past and through evolving tissue in the development of com- 
plex organisms. Cell motion is powered by dynamic changes in the structural pro- 
teins (actin) that make up the cytoskeleton, and regulated by a circuit of signaling 
proteins (GTPases) that control the cytoskeleton growth, disassembly, and active 
contraction. Interesting mathematical questions we have explored include (1) How 
do GTPases spontaneously redistribute inside a cell? How does this determine the 
emergent polarization and directed motion of a cell? (2) How does feedback between 
actin and these regulatory proteins create dynamic spatial patterns (such as waves) 
in the cell? (3) How do properties of single cells scale up to cell populations and 
multicellular tissues given interactions (adhesive, mechanical) between cells? Here 
I survey mathematical models studied in my group to address such questions. We 
use reaction-diffusion systems to model GTPase spatiotemporal phenomena in both 
detailed and toy models (for analytic clarity). We simulate single and multiple cells 
to visualize model predictions and study emergent patterns of behavior. Finally, we 
work with experimental biologists to address data-driven questions about specific 
cell types and conditions. 


1 Introduction: Motile Cells and Their Inner Workings 


Many types of cells are endowed with the ability to move purposefully. As an exam- 
ple, neutrophils, shown in Fig. 1a, are white blood cells that make up part of our 
immune system, in charge of patrolling tissues for pathogens or sites of injury. The 
motion of unicellular organisms such as bacteria, while interesting in its own right, 
is governed by distinct mechanisms that will not be discussed here. 
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(a) 


(d) 


LE. 


Fig.1 Cell motility and cell polarization: from biology to mathematical model: a A white blood 
cell (neutrophil) moving between red blood cells (disk-shaped objects) from a 1950s movie clip by 
David Rogers. The 1D band represents a transect of the cell from front to back. We are concerned 
with how the cell breaks symmetry and polarizes to define such a front-back axis. b, e Sketch of 
a cell in top-down b and side c views, indicating the same 1D axis. d In our mathematical model, 
we aim to explain how regulatory proteins in the cell (called GTPases) spontaneously polarize and 
form hot spots of activity that define the front and back of the cell. e In our abstract “wave-pinning” 
model, this same process is depicted as a 1D pattern-formation event, with a wave that stalls to 
produced a polarized distribution 


| E 


In a movie dating to the 1950s’ David Rogers (then at Vanderbilt University) 
captured the amoeboid movements of a neutrophil as it navigates between red blood 
cells (disk shaped objects in Fig. 1a). In this movie, which can be seen on a popular 
YouTube site, we see a crawling cell, with dynamic shape—a broad front that pushes 
outwards, and a thin tail that is pulled along as the cell moves. Figure 1b, c are two 
projections of cell shape (top down in (b) and side view in (c)) that we later utilize 
in modeling cell polarization. 

It is worth pointing out the sizes and timescales that concern us here. In contrast 
to some papers (e.g. Prof. Marsha Berger’s whose work describes geological size 
scales and timescales of hours and days [1]), here we deal with the micro-world of 
cells, whose diameter is on the order of 10-30 um. The time-scale of relevance is 
on the order of seconds. As summarized in Table |, the process of cell polarization, 
which defines the front and back of the cell and specifies its direction of motion, 
take place over seconds across the tiny cell diameter. Also noteworthy is the fact 
that the production of new copies of proteins (i.e. protein synthesis) does not suffice 
to explain how protein activity becomes concentrated at some parts of a cell, since 
synthesis takes hour(s), while the response times of a cell to stimuli that polarize it 
is known to take only seconds for fast-moving cells like neutrophils. 

Here the purpose is to explain an important first step in cell motility: the symmetry 
breaking that creates a front and a back in the cell (Fig. 1d), namely the polarization 
of the cell. But before embarking on the mathematics that describes this process, we 
first discuss the important cellular components that are involved. 
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Table1 Typical sizes and speeds of cells, and typical time-scales of protein synthesis and activation 


Cell part or process Typical size 
Cell diameter 10-30 ym 
Cell thickness 0.1 um 

Cell speed (WBC) 0.1-0.2 m/s 
Response time to stimuli Few seconds 
Protein synthesis time Hour(s) (!!) 
Protein activation time Few seconds 
Diffusion rates (proteins) 0.1-10 i m?/s 


Recall that 1 jum = 1079 m. WBC white blood cell (neutrophil) 


1.1 Actin Powers Cell Motility 


Unlike plants and bacteria, animal cells have no tough outer cell wall. They are 
enclosed in a lipid membrane that envelopes the interior, which in turn includes the 
fluid cytosol and many organelles. Most organelles, including the cell’s nucleus are 
not directly involved in powering cellular motion. 

Without some structural components, the cell would be essentially a bag of fluids. 
An internal “skeleton” (called the cytoskeleton) is formed by a meshwork of fila- 
mentous actin (F-actin), a dynamic biopolymer protein structure that is assembled at 
what becomes the cell front. The polymerization of actin leads to protrusion of the 
cell front [23]. Meanwhile, in association with the motor protein myosin, contraction 
of actomyosin leads to retraction of the rear portion of the cell [33], Fig. 2a. 

Due to the abundance of actin monomers at excess concentration in every cell, 
actin assembly would be an explosive process were it not tightly controlled by many 
interacting regulatory cellular proteins. Many of those proteins, discovered and char- 
acterized experimentally over the last decades [27, 34], interact with actin to make it 
branch, to cut or cap its growing ends, to sequester or to recycle its monomeric sub- 
units. Other proteins play the role of master-regulators that control the components 
of the cytoskeleton [30]. 


1.2 GTPases Are Master Regulators 


One important class of proteins that regulate the cytoskeleton is the class of Rho 
GTPases, among which Rac and Rho are well known [3]. In the schematic Fig. 2, 
GTPases are shown to promote the assembly of filamentous actin, and the activity 
of myosin contraction. The GTPase Rac does the former, while the GTPase Rho 
enables the latter. Hence, if we can explain how Rac and Rho activities concentrate 
at one or another part of the cell, we can also explain the localizations of a front and 
rear cellular axis, and hence cell polarization. This then, is the main focus of our 
approach. 
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Active GTPase (u) 
Inactive GTPase (v) 


FRONT REAR 


Fig. 2 Schematic diagram of the cell’s motility machinery: a Actin filaments (F-actin), rep- 
resented as blue curves, assemble at what becomes the cell front. Actin polymerization leads to 
protrusion at the front edge of the cell. In the cell rear, myosin motors (not shown) associate with 
F-actin to contract and pull up the “tail”. Proteins in the class known as Rho GTPases are master 
regulators. These proteins control where and when actin assembly and myosin contraction take 
place. GTPases play an essential role in cell polarization. b Each GTPase has an active and an 
inactive state, modeled by the variables u, v. Only when bound to the cell membrane (shown in 
yellow) is the GTPase active. A, / denote rates of activation and inactivation 


Interestingly, proteins in the family of Rho GTPases have a curious life-cycle. 
They occur in active and inactive forms, with only the active forms exerting the 
effects mentioned above [8]. Moreover, the active forms are always bound to the 
fatty membrane that forms the outer cell envelop (shown in yellow in Fig. 2). Hence, 
the small GTPases spend their cellular lives shuttling between the cell membrane 
(where part of their structure gets embedded when active) and the cell interior (where 
they are entirely inactive). This basic idea is illustrated in Fig. 2b. The GTPases act 
as cellular switches that are *ON" when active and *OFF" otherwise. 

A natural question one could ask, is what is the functional purpose of the GTPase 
cycling between the cell's membrane and the cell's interior? As we shall see, math- 
ematics may have something to contribute towards answering such questions. A 
second question is what property of the cellular machinery account for the spon- 
taneous polarization of the cell? That is, how do GTPases redistribute so that their 
levels of activity differ between the front and rear of a cell [2]. 


2 Mathematical Models 


In our earliest works on cell polarization, we attempted to account for many known 
features of the GTPase activity and their crosstalk and interactions [6, 18, 20]. Such 
models were largely computational, as it was a challenge to analyse them mathemat- 
ically. It was clear that more basic model variants would be useful for mathematical 
progress to be feasible. 

As described in Mori et al. [24, 25], we simplify a very complicated cellular pro- 
cess to allow for mathematical tractability. We thereby hope to identify key elements 
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Fig. 3 Model geometry: The complicated cell geometry is simplified into a 1D domain (transect 
along the cell diameter) with active and inactive proteins distributed along that axis, but with distinct 
rates of diffusion, D, «& D, 


that allow for spontaneous cell polarization. First, we consider just one GTPase (say 
Rac), rather than the entire network (Cdc42, Rac and Rho). We ask which biological 
attributes account for spontaneous symmetry breaking and polar pattern formation. 
To investigate this, we construct the following mathematical model. 

We define u(t), v(t) to be the concentrations of the active and inactive forms of 
the GTPase. Then, based on the schematic diagram in Fig.2b, it follows that 


a E OV Aico 7 

dt dt 

This is not yet enough, since spatial distribution is a vital aspect. Hence, we require 
a spatial variable, and need to account for the localization of each of u, v. To do so, 
we also need to define the geometry of interest. 

As argued earlier, and noted in Fig. 1, to explain symmetry breaking for polariza- 
tion, a 1D model along the front-back axis suffices. And while the detailed residence 
of the proteins on the membrane or cell interior is important, it proves helpful to 
simplify this too, in the steps shown in Fig.3. In that figure, we first idealize the 
cell as a thin sheet of uniform thickness, surrounded top and bottom by a membrane 
(yellow outline). Zooming in on a small portion of the cell, we might see active (red) 
and inactive (black) copies of the GTPase associated with the membrane or the fluid 
cell interior. We homogenize these compartments, treating both u and v as dependent 
variables on a 1D spatial domain 0 < x < L where L is the cell diameter. We do 
however, take into account the very different rates of diffusion of a protein in the 
membrane (D, ~ 0.01—0.1 jum? /s) versus the fluid cell interior (D, ~ 10 um?/s) 
[28]. As we shall see, this huge disparity in diffusion plays a significant role. 
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The model becomes 


du 3u 
ER = D, 3:2 + Av — Iu, (1a) 
dv y 
— =D, -A Iu. 1b 
ot 0x? Mdb Ter 


In principle, the rates of activation and inactivation A, J, are not merely constant. 
If they were, then Eq. (1) would be linear in u, v, and would have fairly uninter- 
esting steady state solutions. Some nonlinearity is essential, and this also requires 
feedback—something that can only depend on levels of active proteins. (Recall that 
the inactive GTPases do not participate in any interactions.) We have considered 
models where many other proteins influence each of the state transitions [14, 18, 
21], and in that case, the model would expand in complexity, 


du; 4; 

ur T Pegga as its (2a) 
ðv y 

gr De gga T Alters tare Dvr Gs, Dien (2b) 
Qu» 


Such examples, considered in the context of biological experiments, are briefly 
discussed further on, but mathematically, they are harder to analyze. 

Our ultimate purpose, mathematically, is to strip away such complexity and focus 
on the most elementary example, where a single GTPase polarizes on its own. To do 
SO, we considered the version 


ðu _ p ou AQ) i (3a) 
DEL a i 
dv y 


with feedback exclusively in the activation rate A(u) and a constant rate of inactiva- 
tion Z. This specific choice is somewhat arbitrary, as shown in [18], since it is possible 
to obtain essentially the same behaviour with nonlinearity introduced by assuming 
that J = I (u) with A constant, or by other variants where both A and 7 depend on 
u. The biological interpretation is somewhat different, since distinct proteins in cells 
play the role of activating (GEFs) and inactivating (GAPS) the GTPases. In the case 
of constant J, we can rescale time, so that J = 1. Altogether, then, the single-GTPase 
system consists of the pair of PDEs 
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du 9?u 
Ot = Dux + f(u, v), (4a) 
dv 92v 
Ot = Dizz T f(v), (4b) 
with 
u” 
fl) = (b+ ye) ou (4c) 


where b is the basal rate of activation and y is an additional rate of activation depicting 

positive feedback from u to its own activation. The constant n > 2 is the so-called 

“Hill coefficient". Larger values of n result in sharper switching between states. 
We also assume Neumann boundary conditions, namely, 


ux(0,t)=0, us(L,t)=0, v,(0,t)=0, v,(L,t) =0. (4d) 


This signifies that no material leaks out of the ends of the 1D domain, i.e. that the 
cell ends are sealed. 

Notably, on the timescale of interest (a few seconds), no protein is made or lost, 
it is merely exchanged between the active and inactive states (see Table 1). This is 
captured by the model, since it is easy to see that the total amount of protein in the 
domain is conserved, that is, 


L 
1 
Mean total concentration = L J (u(x, t) + v(x, t))dx = constant (5) 
0 


As shown in [24, 25], the following properties are necessary and sufficient to 
ensure that a unimodal pattern (depicting a polarized distribution) will exist as a 
nonuniform steady state of the model: 


1. There is some range of values v, < v < v» for which the function f(u, v) has 
three roots, Ua < Um < up. (We refer to this range of v as the bistable regime.) 

2. Of these three roots, the outer two (ua, up) are stable fixed points of the spatially 
homogeneous variant of (4). 

3. For some value, v* in v; < v < vo, there is a change in the sign of the integral 


f f(u, v)du. 


Ua 


4. The rates of diffusion of u and v are sufficiently different: D, < Dy. 


It is interesting to contrast the system (4) with a related one consisting of (4a), 
(4c) and (4d) but with v = constant, that is, with a single bistable reaction-diffusion 
equation in one variable, u. The latter is known to sustain traveling wave solutions, as 
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Fig. 4 Travelling waves versus wave-pinning: a A single reaction-diffusion equation (4a) (for 
constant v) with kinetics of type (4c) is known to sustain traveling wave solutions for u(x, t). 
b In contrast, the system of Eqs. (4a)-(4d) with conservation and distinct rates of diffusion (Dy < 
D») results in waves that stop inside the domain, a phenomenon we termed “wave-pinning” 


shown in Fig. 4a. In contrast, the two-variable system (4a)—(4d) leads to waves that 
decelerate and stop inside the domain (once the sign condition above is satisfied) as 
demonstrated in Fig.4b. We refer to this behaviour as “wave-pinning”. We see that 
Fig. 4a fails to explain polarization, because the cell diameter is eventually uniformly 
active. Figure 4b is consistent with polarization, since the two ends of the domain 
develop distinct levels of activity as time goes by. In this sense, wave-pinning is a 
simple caricature of cell polarization. 


2.1 How Wave-Pinning Works 


Full details of the analysis of such dynamics are described in [25]. Here it suffices 
to briefly mention the key asymptotic analysis ideas used in establishing the result. 
The system (4) is rescaled to exploit the existence of a small parameter 


2 Pu 
rL?' 


where r is a typical kinetics rate constant with units of 1/time (e.g., r = y). We then 
examine the short and intermediate time-scales of the rescaled system. 

On a short time-scale (f, = £/€), it can be shown that to leading order, at various 
sites in the domain, u approaches its steady state values ua, up. This means that the 
domain is “carved up" into plateaus of high and of low activity levels u separated by 
transition layers between them. 

To make progress, we consider the case of a single interface separating a low 
and a high plateau. Let the position of the interface be $ (t). We go on to seek the 
intermediate time scale behaviour. We construct an inner and an outer solution next to 
the transition layer and show that, to leading order, the variable v is roughly spatially 
constant on the two sides of the interface v © Vo(t), while it is depleted in time as u 
evolves. 
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Fig. 5 Regimes of wave-pinning: Wave-pinning, which represents cell polarization, depends on a 
balance between the total amount of GTPase (5) and the size of the small parameter e = D,,/ CL’). 
If the total amount is too small, the wave of activity collapses, whereas if it is too large, the wave 
sweeps across the entire domain, and a net homogenous state results. Polarization can also be lost 
in several ways (1) If the cell size decreases too much, and hence e increases, the system leaves the 
polarization regime. (2) If cell size increases so that the mean total GTPase becomes too “diluted”, 
polarization can also be lost. Image credit: Alexandra JI kine 


Using well-known analysis for wave-speed, we construct the speed of the wave, 
finding it to be described by a ratio of two integrals 


J? f(u, v)du 
speed = —*“——____. 
h 

Here ua, up depend on Vo(t), and I» is a strictly positive integral. We argue that the 
wave stops when the numerator vanishes, which is guaranteed to happen at some 
point by Condition 3, a Maxwell condition. Indeed, once v is depleted sufficiently, 
to the level v*, the integral in the numerator vanishes. Details and discussion of the 
steps appear in [25]. Regimes of polarization are shown in (Fig. 5). 

Intuitively, the result can be explained as follows: at the transition zone, the high u 
plateau activates an adjoining site by virtue of local diffusion and positive feedback. 
The spread of u, however, is at the expense of the inactive form v, which gets 
depleted as the wave of activity spreads. Once v is sufficiently depleted, the spread 
of the activity wave can no longer be sustained. At that point, the wave freezes. 

It is also interesting to note that the fast diffusion of v means that it acts as a “global 
messenger’ in the sense that it rapidly stores domain-wide information about the level 
of activity in the cell. Hence, local activation (of u by itself) and global depletion (of 
v) synergize to produce the polarization of activity in the domain. 
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3 Recent Work: Analysis, Simulation, and Contact with 
Experiments 


The wave-pinning equations are merely a prototype of the dynamics of a protein in 
the small GTPase family. Related systems with greater levels of biological detail have 
also been explored [12, 14, 21]. Indeed insights by AFM Marée in [20] contributed 
to the understanding that led to the mathematical treatment of wave-pinning in 
[24, 25]. 


3.1 Analysis of Slow-Fast Reaction Diffusion Systems: LPA 


While studying systems of reaction-diffusion equations (RDEs) for cell polarization, 
we have benefitted from a number of recent methods that result in shortcuts for 
quick diagnosis of pattern-formation regimes. Among these, the “Local Perturbation 
Analysis” (LPA) is a method to track local and global variables in RDEs using ODEs 
that approximate the fate of a small peak of activity (uz). This method was invented 
by AFM Mareé and V Grieneisen [9, 36], and popularized in several papers [11, 12, 
15]. It has helped us to identify approximate regimes where a nonuniform pattern 
could form by a finite perturbation of a spatially uniform state in a fast-slow reaction 
diffusion system. 

Figure 6 illustrates a typical LPA bifurcation result, and its interpretation. The 
method identifies the existence of a spatially uniform global branch (in black), and 
parameter regimes where this branch is stable (solid) or unstable (dot-dashed curve). 
Even when the global homogeneous steady state is stable, a polarized pattern can be 
established with large enough stimulus. The local variable uz represents a thin local 
peak of active u. That peak could grow (and lead to a polar pattern) in the regime where 
the solid red curve is present. The LPA diagram demonstrates that a sufficiently large 
stimulus peak is needed, that its size has to exceed a threshold (dashed red curve), 
and that some parameter regimes allow for patterning in response to arbitrarily small 
stimuli (dot-dashed black curve). The latter regimes can be identified with Turing 
instabilities. The former regimes are not discoverable by the usual linear stability 
analysis (LSA) for Turing pattern formation, and are a helpful aspect of LPA that 
goes beyond LSA. 

In our experience, solving the full PDEs with insights gained from LPA diagrams 
makes it easier to identify the interesting parameter regimes. Details of the method 
and its uses has been extensively described in [15]. Other useful shortcuts have 
included “sharp-switch” approximations (Hill functions replaced by piecewise con- 
stant functions), as in [12], and analysis of plateaus described in [36]. None of these 
replace the need for simulating the PDEs, but all of them help to gain familiarity 
with possible expected behaviours of the reaction-diffusion systems we have inves- 
tigated. Most recently, Andreas Buttenschón has created full numerical bifurcation 
software for PDEs that permits much greater accuracy in tracking solution branches 
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Fig. 6 Methods of analysis and simulations: a Local perturbation analysis (LPA), a shortcut 
bifurcation method has helped to detect regimes of patterning in slow-fast reaction-diffusion sys- 
tems. Here we show an example of how the basal activation rate b influences potential regimes of 
wave-pinning and of Turing-type instability. See text and references [11, 12, 15] for details. b A 
number of methods have been used to simulate polarization in 2D deforming domains representing 
the “top-down” view of a cell (as in Fig. 1b). From top to bottom: A cellular-Potts model simula- 
tion by A. F. M. Mareé of a 2D deforming cell with an internal reaction-diffusion signaling circuit 
(and an implicit reaction-diffusion solver) that includes GTPases, interacting lipids, actin, and other 
components [21], the wave-pinning system (4) solved in an immersed-boundary method simulation 
by Ben Vanderlei [35], by the level set and moving boundary node method by Zajac [7], and using 
CompuCell3D by undergraduate summer research student Zachary Pellegrin 


[4]. The software builds on state of the art well-conditioned collocation techniques 
to discretize functions and their operators. Solution branches are continued using 
a matrix-free Newton-Gauss method, for which rigorous convergence estimates are 
available. 


3.2 Simulating the PDEs in Dynamic Cell-Shaped Domains 


So far, analytic results were described in 1D domains that represent a cell transect. 
It is instructive to ask how the same systems behave in domains whose shape more 
closely relates to that of cells, and in particular, where the internal chemistry affects 
(and is affected by) the deforming cell. Based on the fact that cell fragments (radius 
= 5-10 um) without a nucleus, and with overall uniform thickness (70.2 yum) are 
capable of motility, we take the liberty of reducing cell shape to its two-dimensional 
“top-down” projection shown in Fig. 1b, d. We solve the governing equations (4) 
or more detailed versions, in the 2D domain, and assume that the boundary of the 
domain is influenced by the local chemical activity level. For example, if u represents 
the level of activity of the GTPase Rac, it causes the boundary to be pushed outwards 
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(via F-actin assembly), whereas Rho has the opposite effect (activating contraction 
via myosin). 

A number of results obtained over the years by group members are illustrated in 
Fig. 6b. In general, we found that the simplest system to understand analytically (4), is 
not as robust computationally as other variants. Cross-talk between GTPases results 
in larger parameter regimes for polarization. As an example, models consisting of four 
PDEs that describe the mutual antagonism between Rac and Rho [12] lead to greater 
robustness in 2D computations. An even more detailed variant, that includes several 
GTPases (Rac, Rho, Cdc42), as well as their effects on actin assembly and myosin 
contraction was capable of realistic behaviour such as directed motility (chemotaxis) 
[20]. The addition of a layer of signaling lipids (phosphoinositides) also permitted 
a simulated cell to rapidly select one front despite conflicting or competing stimuli 
[21]. 

Simulating the reaction-diffusion systems for GTPase signaling in deforming 
domains also reveals that evolving domain shape and level curves of the chemical 
system influence one another: the zero-flux boundary conditions impose constraints 
on the level curves that also accelerate the dynamics of the chemical redistribution 
when the domain deforms. Such findings were discussed in detail in [21]. 

For practical reasons, it is harder to simulate the same systems in 3D. However, 
recent work by the group of Anotida Madzvamuse [5] has extended these results to a 
coupled bulk-surface wave-pinning computation in a 3D cell-shaped static domain. 


3.3 Contact with Biological Experiments 


While details are beyond the scope of this summary, it is worth noting several direc- 
tions in which the mathematical modeling has contributed to understanding of exper- 
imental cell biology. 

Willian Bement (U Wisconsin) studies the patterns of GTPases (Rho and Cdc42) 
that form spontaneously around sites of laser-inflicted wounds in frog eggs (Xeno- 
pus oocytes). The connectivity of these GTPases, and their crosstalk with proteins 
that activate or inactivate them (e.g. Abr) has been modeled by group members, 
including Cory Simon, Laura Liao, and William R Holmes. Combining models with 
experiments has helped to build an understanding of the biology [12, 13, 32]. 

The polarization of HeLa cells exposed to gradients that stimulate a graded 
response by the GTPase Rac were studied experimentally by Benjamin Lin, in the 
Lab of Andre Levchenko [19]. A model for Cdc42, Rac, and Rho, interacting with 
one another and with the phosphoinositides PIP, PIP, and PIP3 explained the timing 
and strength of the response, and predicted results of experimental manipulations 
that affect parts of the crosstalk [14, 19]. 

Experiments have been carried out on melanoma cells grown on microfabricated 
surfaces that mimic the natural environment of cells (“extracellular matrix"). JinSeok 
Park, of the Levchenko Lab at Yale University found three typical motility pheno- 
types, including persistently polarized, random, and oscillatory front-back cycling, 
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Fig. 7 Extensions of the minimal model: a The simplest basic wave-pinning model of Eq. (4) 
can produce a polarized pattern. b When the GTPase promotes assembly of F-actin, which then 
promotes GTPase inactivation, waves and other exotic dynamics can be observed, provided the 
negative feedback is on a slow time-scale [10, 22]. In a, b time increases along the vertical axis and 
space is on the horizontal axis. c Some GTPases cause the cell to spread (Rac) or to shrink (Rho), 
affecting cell tension. If the tension also affects GTPase activity, interesting dynamics are observed. 
Shown is a time sequence (left to right) of a “tissue” composed of 370 cells, colour coded by their 
internal GTPase activity. The cell size is correlated to that activity, as described in [37] 


depending on levels of adhesion to the substrate, and manipulations that affect activ- 
ities of the GTPases or their downstream targets. We were able to account for the 
observed phenotypes by a model for Rac-Rho mutual antagonism, weighted by sig- 
nals from the extracellular matrix substrate [16, 26, 29]. 


4 Extending the Minimal Model 


The wave-pinning model has been used as a nucleus from which we have expanded to 
larger circuits, and greater levels of biological detail. We showed that some properties 
of the system (4) is shared by a circuit of the mutually antagonistic GTPases Rac- 
Rho [12]. A notable common feature is the existence of parameter regimes in which 
several states coexist. These include states of uniformly low activity, uniformly high 
activity, or polarized levels of activity. Which of these develops then depends on 
initial conditions. A recent contribution [38] extends these findings to more general 
model variants. 

A hallmark of the kinetics we described above is the presence of bistability in 
some parameter regimes, i.e. the existence of two stable steady states separated by an 
unstable one. Such systems also display hysteresis, or a kind of history-dependence: 
slowly increasing a parameter results in a sudden appearance of a new steady state 
at some transition point, but to reverse the process, the same parameter has to be 
decreased much beyond the transition point. The addition of feedback from a third 
dynamic variable in such cases, is known to produce the possibility of oscillations. 
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We examined several cases of this type, motivated by biological observations. 
In one case, we studied feedback from F-actin to the inactivation of a GTPase, as 
observed, for example, in [31]. Assuming slow negative feedback from F-actin (to 
the inactivation of the GTPase), as shown in Fig. 7b leads to interesting dynamics of 
traveling waves and pulses in the domain [10, 22]. Feedback between the Rac-Rho 
circuit and the extracellular matrix also results in oscillations, as previously described 
[16]. More recently, we also modeled the interplay between mechanical tension in 
the cell and the activity of GTPases, as observed experimentally by [17]. Here we 
assumed that GTPase such as Rho and Rac can affect cell spreading, which changes 
the tension on the cell and feeds back to the activation of the GTPase. A typical 
circuit of this type is shown in Fig. 7c. As expected, such negative feedback is also 
consistent with regimes of oscillatory dynamics in individual cells, as demonstrated 
in [37]. Moreover, when cells with such behaviour are coupled to one another in 1D 
or in 2D (simulations in Fig. 7c), one observes waves of chemical activity coupled 
to cell-size changes as the “model tissue" undergoes the spatio-temporal dynamics 
so created. 


5 Discussion 


Cell biology presents an unlimited source of inspiring problems. The links between 
mathematics and cell biology are relatively recent, and not yet fully recognized. But 
the need for quantitative methods, computational platforms, and mathematical analy- 
sis of cellular phenomena promises to grow with time, presenting many opportunities 
for young applied mathematicians looking for problems to study. 

HereIhave mainly described a toy model that we constructed to help us understand 
cell polarization. The simplicity of the model made it mathematically tractable. Its 
analysis reveals several insights that were not a priori evident. First, with the right 
kind of positive feedback, we showed that a single GTPase could, on its own, lead 
to spontaneous polarization that explains cell directionality. In other words, it is 
not essential to have networks of such proteins to achieve this cellular process. 
Second, there is a functional purpose for the curious biology of GTPases: their 
cycling between membrane and cytosol is not a mere evolutionary artifact. We argue 
that this transition sets up the differences in diffusion between active and inactive 
GTPases—a difference that is crucial for polarization to be possible, according to 
our mathematical model. 

The motivation of cell polarity led us to mathematics with a surprising twist, 
uncovering the phenomenon of decelerating waves and wave-pinning that were not 
widely recognized before in the literature on reaction-diffusion systems. From this 
standpoint, we could argue that biology inspires new mathematics. The efforts to 
understand models that were so developed also resulted in a variety of methods that 
ease the analysis, among them LPA. Extensions of the basic wave-pinning model led 
to variants with more exotic patterns and waves. These were investigated in various 
geometries, in single cells, and finally, in interacting groups of cells to identify 
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causes for cell size fluctuations in a tissue and for a variety of emergent phenomena 
in single and collective cell motility. Finally, developing simple theoretical models 
and in parallel considering biologically-inspired detailed models are not mutually 
exclusive. Our experience in the former helps us with the later, and vice versa. 

Many still-unanswered questions can be posed. Among these are some of the fol- 
lowing: How does the internal GTPase state of a cell affect the outcome of interactions 
between cells, and how does contact between cells change their GTPase state? What 
are reasonable ways to model such cell-cell interactions leading to cell adhesion or 
cell separation? How is cell state coordinated in a multicellular tissue? What aspects 
of cell adhesion, mechanics, deformation, chemical secretion, and environmental 
topography (to name a few) affect and are affected by GTPase activities, and how 
should these be modelled? What methods of analysis can we develop to help with 
larger, more realistic models that have many interacting components? What aspects 
of 3D cell shape, and of cell motion in a 3D matrix lead to new phenomena, and what 
numerical methods should be developed to address such behaviours? Is there a com- 
promise between large-scale computations and mathematical analysis in these more 
challenging scenarios? In conclusion, the motility and interactions of cells is a rich 
scientific area calling for investigation by applied mathematicians. Pattern formation 
inside living cells is merely one facet, while many other fundamental challenges are 
at hand. 
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Private AI: Machine Learning A) 
on Encrypted Data ds 


Kristin Lauter 


Abstract This paper gives an overview of my Invited Plenary Lecture at the Inter- 
national Congress of Industrial and Applied Mathematics (ICIAM) in Valencia in 
July 2019. 


1 Motivation: Privacy in Artificial Intelligence 


These days more and more people are taking advantage of cloud-based artificial intel- 
ligence (AI) services on their smart phones to get useful predictions such as weather, 
directions, or nearby restaurant recommendations based on their location and other 
personal information and preferences. The Al revolution that we are experiencing in 
the high tech industry is based on the following value proposition: you input your 
private data and agree to share it with the cloud service in exchange for some use- 
ful prediction or recommendation. In some cases the data may contain extremely 
personal information, such as your sequenced genome, your health record, or your 
minute-to-minute location. 

This quid pro quo may lead to the unwanted disclosure of sensitive information 
or an invasion of privacy. Examples during the year of ICIAM 2019 include the case 
of the Strava fitness app which revealed the location of U.S. army bases world-wide, 
or the case of the city of Los Angeles suing IBM’s weather company over deceptive 
use of location data. It is hard to quantify the potential harm from loss of privacy, 
but employment discrimination or loss of employment due to a confidential health 
or genomic condition are potential undesirable outcomes. Corporations also have a 
need to protect their confidential customer and operations data while storing, using, 
and analyzing it. 

To protect privacy, one option is to lock down personal information by encrypting 
it before uploading it to the cloud. However, traditional encryption schemes do not 
allow for any computation to be done on encrypted data. In order to make useful 
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predictions, we need a new kind of encryption which maintains the structure of the 
data when encrypting it so that meaningful computation is possible. Homomorphic 
encryption allows us to switch the order of encryption and computation: we get the 
same result if we first encrypt and then compute, as if we first compute and then 
encrypt. 

The first solution for a homomorphic encryption scheme which can process any 
circuit was proposed in 2009 by Gentry [21]. Since then, many researchers in cryp- 
tography have worked hard to find schemes which are both practical and also based 
on well-known hard math problems. In 2011, my team at Microsoft Research collabo- 
rated on the homomorphic encryption schemes [8, 9] and many practical applications 
and improvements [30] which are now widely used in applications of Homomorphic 
Encryption. Then in 2016, we had a surprise breakthrough at Microsoft Research 
with the now widely cited CryptoNets paper [22], which demonstrated for the first 
time that evaluation of neural network predictions was possible on encrypted data. 

Thus began our Private AI project, the topic of my Invited Plenary Lecture at the 
International Congress of Industrial and Applied Mathematics in Valencia in July 
2019. Private AI refers to our Homomorphic Encryption-based tools for protecting 
the privacy of enterprise, customer, or patient data, while doing Machine Learning 
(ML)-based AI, both learning classification models and making valuable predictions 
based on such models. 

You may ask, “What is Privacy?” Preserving “Privacy” can mean different things 
to different people or parties. Researchers in many fields including social science and 
computer science have formulated and discussed definitions of privacy. My favorite 
definition of privacy is: a person or party should be able to control how and when their 
data is used or disclosed. This is exactly what Homomorphic Encryption enables. 


1.1 Real-World Applications 


In 2019, the British Royal Society released a report on Protecting privacy in practice: 
Privacy Enhancing Technologies in data analysis. The report covers Homomorphic 
Encryption (HE) and Secure Multi-Party Computation (MPC), but also technologies 
not built with cryptography, including Differential Privacy (DP) and secure hardware 
hybrid solutions. Our homomorphic encryption project was featured as a way to 
protect “Privacy as a human right” at the Microsoft Build world-wide developers 
conference in 2018 [39]. Private AI forms one of the pillars of Responsible ML in 
our collection of Responsible AI research and Private Prediction notebooks were 
released in Azure ML at Build 2020. 

Over the last 8 years, my team has created demos of Private AI in action, running 
private analytics services in the Azure cloud. I showed a few of these demos in my talk 
at ICIAM in Valencia. Our applications include an encrypted fitness app, which is a 
cloud service which processes all your workout and fitness data and locations in the 
cloud in encrypted form, and displays your summary statistics to you on your phone 
after decrypting the results of the analysis locally. Another application shows an 
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encrypted weather prediction app, which takes your encrypted zip-code and returns 
encrypted versions of the weather at your location to be decrypted and displayed to 
you on your phone. The cloud service never learns your location or what weather 
data was returned to you. Finally, I showed a private medical diagnosis application, 
which uploads an encrypted version of your Chest X-Ray image, and the medical 
condition is diagnosed by running image recognition algorithms on the encrypted 
image in the cloud, and returned in encrypted form to the doctor. 

Over the years, my team! has developed other Private AI applications, enabling 
private predictions such as sentiment analysis in text, cat/dog image classification, 
heart attack risk based on personal health data, neural net image recognition of 
hand-written digits, flowering time based on the genome of a flower, and pneumonia 
mortality risk using intelligible models. All of these operate on encrypted data in the 
cloud to make predictions, and return encrypted results in a matter of fractions of a 
second. 

Many of these demos and applications have been inspired by collaborations with 
researchers in Medicine, Genomics, Bioinformatics, and Machine Learning. We have 
worked together with finance experts and pharmaceutical companies to demonstrate 
a range of ML algorithms operating on encrypted data. The UK Financial Conduct 
Authority (FCA) ran an international Hackathon in August 2019 to combat money- 
laundering with encryption technologies by allowing banks to share confidential 
information with each other. Since 2015, the annual iDASH competition has attracted 
teams from around the world to submit solutions to the Secure Genome Analysis 
Competition. Participants include researchers at companies such as Microsoft and 
IBM, start-up companies, and academics from the U.S., Korea, Japan, Switzerland, 
Germany, France, etc. The results provide benchmarks for the medical research 
community of the performance of encryption tools for preserving privacy of health 
and genomic data. 


2 What Is Homomorphic Encryption? 


I could say, “Homomorphic Encryption is encryption which is homomorphic.” But 
that is not very helpful without further explanation. Encryption is one of the building 
blocks of cryptography: encryption protects the confidentiality of information. In 
mathematical language, encryption is just a map which transforms plaintexts (unen- 
crypted data) into ciphertexts (encrypted data), according to some recipe. Examples 
of encryption include blockciphers, which take sequences of bits and process them 
in blocks, passing them through an S-box which scrambles them, and iterating that 
process many times. A more mathematical example is RSA encryption, which raises 


1 My collaborators on the SEAL team include: Kim Laine, Hao Chen, Radames Cruz, Wei Dai, Ran 
Gilad-Bachrach, Yongsoo Song, Shabnam Erfani, Sreekanth Kannepalli, Jeremy Tieman, Tarun 
Singh, Hamed Khanpour, Steven Chith, James French, with substantial contributions from interns 
Gizem Cetin, Kyoohyung Han, Zhicong Huang, Amir Jalali, Rachel Player, Peter Rindal, Yuhou 
Xia as well. 


100 K. Lauter 


compute 


a, b = axb 
encrypt encrypt 
compute 
EG), E(b) ===> = 


Fig. 1 Homomorphic encryption 


a message to a certain power modulo a large integer N, whose prime factoriza- 
tion is secret, N = p - q, where p and q are large primes of equal size with certain 
properties. 

A map which is homomorphic preserves the structure, in the sense that an operation 
on plaintexts should correspond to an operation on ciphertexts. In practice that means 
that switching the order of operations preserves the outcome after decryption: i.e. 
encrypt-then-compute and compute-then-encrypt give the same answer. This property 
1s described by the following diagram: 

Starting with two pieces of data, a and b, the functional outcome should be the 
same when following the arrows in either direction, across and then down (compute- 
then-encrypt), or down and then across (encrypt-then-compute): E(a + b) Ela) + 
E(b). If this diagram holds for two operations, addition and multiplication, then 
any circuit of AND and OR gates encrypted under map the encryption map E. It is 
important to note that homomorphic encryption solutions provide for randomized 
encryption, which is an important property to protect against so-called dictionary 
attacks. This means that new randomness is used each time a value is encrypted, 
and it should not be computationally feasible to detect whether two ciphertexts are 
the encryption of the same plaintext or not. Thus the ciphertexts in the bottom right 
corner of the diagram need to be decrypted in order to detect whether they are equal. 

The above description gives a mathematical explanation of homomorphic encryp- 
tion by defining its properties. To return to the motivation of Private Al, another way 
to describe homomorphic encryption is to explain the functionality that it enables. 
Figure2 shows Homer-morphic encryption, where Homer Simpson is a jeweler 
tasked with making jewelry given some valuable gold. Here the gold represents 
some private data, and making jewelry is analogous to analyzing the data by apply- 
ing some Al model. Instead of accessing the gold directly, the gold remains in a 
locked box, and the owner keeps the key to unlock the box. Homer can only handle 
the gold through gloves inserted in the box (analogous to handling only encrypted 
data). When Homer completes his work, the locked box is returned to the owner who 
unlocks the box to retrieve the jewelry. 
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Protecting Data via Encryption: 
Homomorphic encryption 


1. Put your gold in a locked box. 
2. Keep the key. 

3. Let your jeweler work on it through a glove box. 
4. Unlock the box when the jeweler is done! 


Fig. 2 Homer-morphic encryption 


To connect to Fig. 1 above, outsourcing sensitive work to an untrusted jeweler 
(cloud) is like following the arrows down, across, and then up. First the data owner 
encrypts the data and uploads it to the cloud, then the cloud operates on the encrypted 
data, then the cloud returns the output to the data owner to decrypt. 


2.1 History 


Almost 5 decades ago, we already had an example of encryption which is homomor- 
phic for one operation: the RSA encryption scheme [36]. A message m is encrypted 
by raising it to the power e modulo N for fixed integers e and N. Thus the product 
of the encryption of two messages m, and m» is mím5 = (m,m5)*. It was an open 
problem for more than thirty years to find an encryption scheme which was homo- 
morphic with respect to two (ring) operations, allowing for the evaluation of any 
circuit. Boneh-Goh-Nissim [3] proposed a scheme allowing for unlimited additions 
and one multiplication, using the group of points on an elliptic curve over a finite 
field, along with the Weil pairing map to the multiplicative group of a finite field. 

In 2009, Gentry proposed the first homomorphic encryption scheme, allowing in 
theory for evaluation of arbitrary circuits on encrypted data. However it took several 
years before researchers found schemes which were implementable, relatively prac- 
tical, and based on known hard mathematical problems. Today all the major homo- 
morphic encryption libraries world-wide implement schemes based on the hardness 
of lattice problems. A lattice can be thought of as a discrete linear subspace of 
Euclidean space, with the operations of vector addition, scalar multiplication, and 
inner product, and its dimension, n, is the number of basis vectors. 
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2.2 Lattice-Based Solutions 


The high-level idea behind current solutions for homomorphic encryption is as fol- 
lows. Building on an old and fundamental method of encryption, each message is 
blinded, by adding a random inner product to it: the inner product of a secret vector 
with a randomly generated vector. Historically, blinding a message with fresh ran- 
domness was the idea behind encryption via one-time pads, but those did not satisfy 
the homomorphic property. Taking inner products of vectors is a linear operation, but 
if homomorphic encryption involved only addition of the inner product, it would be 
easy to break using linear algebra. Instead, the encryption must also add some freshly 
generated noise to each blinded message, making it difficult to separate the noise 
from the secret inner product. The noise, or error, is selected from a fairly narrow 
Gaussian distribution. Thus the hard problem to solve becomes a noisy decoding 
problem in a linear space, essentially Bounded Distance Decoding (BDD) or a Clos- 
est Vector Problem (CVP) in a lattice. Decryption is possible with the secret key, 
because the decryptor can subtract the secret inner product and then the noise is small 
and is easy to cancel. 

Although the above high-level description was formulated in terms of lattices, in 
fact the structure that we use in practice is a polynomial ring. A vector in a lattice 
of n dimensions can be thought of as a monic polynomial of degree n, where the 
coordinates of the vector are the coefficients of the polynomial. Any number ring is 
given as a quotient of Z[x], the polynomial ring with integer coefficients, by a monic 
irreducible polynomial f(x). The ring can be thought of as a lattice in IR" when 
embedded into Euclidean space via the canonical embedding. To make all objects 
finite, we consider these polynomial rings modulo a large prime q, which is often 
called the ciphertext modulus. 


2.3 Encoding Data 


When thinking about practical applications, it becomes clear that real data first has 
to be embedded into the mathematical structure that the encryption map is applied 
to, the plaintext space, before it is encrypted. This encoding procedure must also be 
homomorphic in order to achieve the desired functionality. The encryption will be 
applied to the polynomial ring with integer coefficients modulo q, so real data must 
be embedded into this polynomial ring. 

In a now widely cited 2011 paper, “Can Homomorphic Encryption be Practical?” 
([30, Sect.4.1]), we introduced a new way of encoding real data in the polynomial 
space which allowed for efficient arithmetic operations on real data, opening up a 
new direction of research focusing on practical applications and computations. The 
encoding technique was simple: embed an integer m as a polynomial whose ith 
coefficient is the ith bit of the binary expansion of m (using the ordering of bits 
so that the least significant bit is encoded as the constant term in the polynomial). 
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This allows for direct multiplication of real integers, represented as polynomials, 
instead of encoding and encrypting data bit-by-bit, which requires a deep circuit just 
to evaluate simple integer multiplication. When using this approach, it is important 
to keep track of the growth of the size of the output to the computation. In order to 
assure correct decryption, we limit the total size of the polynomial coefficients to t. 
Note that each coefficient was a single bit to start with, and a sum of k of them grows 
to at most k. We obtain the correct decryption and decoding as long as q > t > k, 
so that the result does not wrap around modulo f. 

This encoding of integers as polynomials has two important implications, for 
performance and for storage overhead. In addition to enabling multiplication of 
floating point numbers via direct multiplication of ciphertexts (rather than requiring 
deep circuits to multiply data encoded bit wise), this technique also saves space by 
packing a large floating point number into a single ciphertext, reducing the storage 
overhead. These encoding techniques help to squash the circuits to be evaluated, and 
make the size expansion reasonable. However, they limit the possible computations 
in interesting ways, and so all computations need to be expressed as polynomials. 
The key factor in determining the efficiency is the degree of the polynomial to be 
evaluated. 


2.4 Brakerski/Fan-Vercauteren Scheme (BFV) 


For completeness, I will describe one of the most widely used homomorphic encryp- 
tion schemes, the Brakerski/Fan-Vercauteren Scheme (BFV) [7, 20], using the lan- 
guage of polynomial rings. 


2.4.1 Parameters and Notation 
Let q > t be positive integers and n a power of 2. Denote A = |q/t]. Define 
R =Z[x]/(x" + D, 
Ry = R/qR = (Z/qZ)|x]/G" + D, 


and R, = Z/tZ[x]/(x” + 1), where Z[x] is the set of polynomials with integer coef- 
ficients and (Z/qZ)[x] is the set of polynomials with integer coefficients in the range 
[0, q — 1). 

In the BFV scheme, plaintexts are elements of R,, and ciphertexts are elements 
of R, x R¿. Let x denote a narrow (centered) discrete Gaussian error distribution. 
In practice, most implementations of homomorphic encryption use a Gaussian dis- 
tribution with standard deviation o [x] ^ 3.2. Finally, let Up denote the uniform 
distribution on Z N [-k/2, k/2). 
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2.4. Key Generation 

To generate a public key, pk, and a corresponding secret key, sk, sample s — U}, 
a<U 2 and e < x". Each of s, a, and e is treated as an element of R,, where the 
n coefficients are sampled independently from the given distributions. To form the 
public key-secret key pair, let 


pk = ([-(as + e)],, a) € RÈ, sk =s 


where [.]; denotes the (coefficient-wise) reduction modulo q. 


2.4.3 Encryption 


Letm € R, beaplaintext message. To encrypt m with the public key pk = (po, pi) € 


R, sample u — Uz and ej, e2 < x". Consider u and e; as elements of R, as in key 
generation, and create the ciphertext 


ct = ([Am + pou + eilg, [piu + e21,) € R;. 


2.4.4 Decryption 


To decrypt a ciphertext ct = (co, c1) given a secret key sk = s, write 
t 
—(co + cis) — m -F v « bt, 
q 


where co + c1s is computed as an integer coefficient polynomial, and scaled by the 
rational number t/q. The polynomial b has integer coefficients, m is the underlying 
message, and v satisfies ||v||;; « 1/2. Thus decryption is performed by evaluating 


m= a + a»| , 
q t 


where |-] denotes rounding to the nearest integer. 


2.4.5 Homomorphic Computation 
Next we see how to enable addition and multiplication of ciphertexts. Addition is 
easy: we define an operation € between two ciphertexts ct; = (co, c1) and ct» = 


(do, d) as follows: 


cti O cta = ([co + dolg, [ci + dil) € Re. 
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sum 


Denote this homomorphic sum by Ctsum = (cQ, c1), and note that if 
t t 
—(cot+ cs) = m, d vi + bt, —(dy+d¡s) = m + v» + bat, 
q q 


then : 
nix T cs) = [mi + mal; +u turt Dsumt, 


As long as ||vi + valla < 1/2, the ciphertext Ct sum is a correct encryption of [m; + 
mal. 

Similarly, there is an operation @ between two ciphertexts that results in a cipher- 
text decrypting to [7 ,m2],, as long as ||v1 llo. and ||vo||o5 are small enough. Since Y 
is more difficult to describe than O, we refer the reader to [20] for details. 


2.4.06 Noise 


In the decryption formula presented above the polynomial v with rational coefficients 
is assumed to have infinity-norm less than 1/2. Otherwise, the plaintext output by 
decryption will be incorrect. Given a ciphertext ct = (co, c1) which is an encryption 
of a plaintext m, let v € Q[x]/(x" + 1) be such that 


t 
—(co + cis) — m +v + bt. 
q 


The infinity norm of the polynomial v called the noise, and the ciphertext decrypts 
correctly as long as the noise is less than 1/2. 

When operations such as addition and multiplication are applied to encrypted data, 
the noise in the result may be larger than the noise in the inputs. This noise growth 
is very small in homomorphic additions, but substantially larger in homomorphic 
multiplications. Thus, given a specific set of encryption parameters (n, q, t, X), one 
can only evaluate computations of a bounded size (or bounded multiplicative depth). 

A precise estimate of the noise growth for the YASHE scheme was givenin [4] and 
these estimates were used in [5] to give an algorithm for selecting secure parameters 
for performing any given computation. Although the specific noise growth estimates 
needed for this algorithm do depend on which homomorphic encryption scheme is 
used, the general idea applies to any scheme. 


2.5 Other Homomorphic Encryption Schemes 


In 2011, researchers at Microsoft Research and Weizmann Institute published the 
(BV/BGV [8, 9]) homomorphic encryption scheme which is used by teams around 
the world today. In 2013, IBM released HELib, a homomorphic encryption library 
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for research purposes, which implemented the BGV scheme. HELib is written in 
C++ and uses the NTL mathematical library. The Brakerski/Fan-Vercauteren (BFV) 
scheme described above was proposed in 2012. Alternative schemes with different 
security and error-growth properties were proposed in 2012 by Lopez-Alt, Tromer, 
and Vaikuntanathan (LTV [33]), and in 2013 by Bos, Lauter, Loftus, and Naehrig 
(YASHE [4]). The Cheon-Kim-Kim-Song (CKKS [14]) scheme was introduced in 
2016, enabling approximate computation on ciphertexts. 

Other schemes [16, 19] for general computation on bits are more efficient for 
logical tasks such as comparison, which operate bit-by-bit. Current research attempts 
to make it practical to switch between such schemes to enable both arithmetic and 
logical operations efficiently ([6]). 


2.6 Microsoft SEAL 


Early research prototype libraries were developed by the Microsoft Research (MSR) 
Cryptography group to demonstrate the performance numbers for initial applications 
such as those developed in [4, 5, 23, 29]. But due to requests from the biomedical 
research community, it became clear that it would be very valuable to develop a well- 
engineered library which would be widely usable by developers to enable privacy 
solutions. The Simple Encrypted Arithmetic Library (SEAL) [37] was developed in 
2015 by the MSR Cryptography group with this goal in mind, and is written in C++. 
Microsoft SEAL was publicly released in November 2015, and was released open 
source in November 2018 for commercial use. It has been widely adopted by teams 
worldwide and is freely available online (http://sealcrypto.org). 

Microsoft SEAL aims to be easy to use for non-experts, and at the same time 
powerful and flexible for expert use. SEAL maintains a delicate balance between 
usability and performance, but is extremely fast due to high-quality engineering. 
SEAL is extensively documented, and has no external dependencies. Other publicly 
available libraries include HELib from IBM, PALISADE by Duality Technologies, 
and HEAAN from Seoul National University. 


2.7 Standardization of Homomorphic Encryption [1] 


When new public key cryptographic primitives are introduced, historically there 
has been roughly a 10-year lag in adoption across the industry. In 2017, Microsoft 
Research Outreach and the MSR Cryptography group launched a consortium for 
advancing the standardization of homomorphic encryption technology, together with 
our academic partners, researchers from government and military agencies, and part- 
ners and customers from various industries: Homomorphic Encryption.org. The first 
workshop was hosted at Microsoft in July 2017, and developers for all the existing 
implementations around the world were invited to demo their libraries. 
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At the July 2017 workshop, we worked in groups to draft three white papers on 
Security, Applications, and APIs. We then worked with all relevant stakeholders of 
the HE community to revise the Security white paper [11] into the first draft standard 
for homomorphic encryption [1]. The Homomorphic Encryption Standard (HES) 
specifies secure parameters for the use of homomorphic encryption. The draft stan- 
dard was initially approved by the HomomorphicEncryption.org community at the 
second workshop at MIT in March 2018, and then was finalized and made publicly 
available at the third workshop in October 2018 at the University of Toronto [1]. A 
study group was initiated in 2020 at the ISO, the International Standards Organiza- 
tion, to consider next steps for standardization. 


3 What Kind of Computation Can We Do? 


3.1 Statistical Computations 


In early work, we focused on demonstrating the feasibility of statistical computations 
on health and genomic data, because privacy concerns are obvious in the realm of 
health and genomic data, and statistical computations are an excellent fit for efficient 
HE because they have very low depth. We demonstrated HE implementations and 
performance numbers for statistical computations in genomics such as the chi-square 
test, Cochran-Armitage Test for Trend, and Haplotype Estimation Maximization [29]. 
Next, we focused on string matching, using the Smith-Waterman algorithm for edit 
distance [15], another task which is frequently performed for genome sequencing 
and the study of genomic disease. 


3.2 Heart Attack Risk 


To demonstrate operations on health data, in 2013 we developed a live demo pre- 
dicting the risk of having a heart attack based on six health characteristics [5]. We 
evaluated predictive models developed over decades in the Framingham Heart study, 
using the Cox proportional Hazard method. I showed the demo live to news reporters 
at the 2014 AAAS meeting, and our software processed my risk for a heart attack in 
the cloud, operating on encrypted data, in a fraction of a second. 

In 2016, we started a collaboration with Merck to demonstrate the feasibility 
of evaluating such models on large patient populations. Inspired by our published 
work on heart attack risk prediction [5], they used SEAL to demonstrate running the 
heart attack risk prediction on one million patients from an affiliated hospital. Their 
implementation returned the results for all patients in about 2h, compared to 10 min 
for the same computation on unencrypted patient data. 
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3.3 Cancer Patient Statistics 


In 2017, we began a collaboration with a Crayon, a Norwegian company that develops 
health record systems. The goal of this collaboration was to demonstrate the value of 
SEAL in a real world working environment. Crayon reproduced all computations in 
the 2016 Norwegian Cancer Report using SEAL and operating on encrypted inputs. 
The report processed the cancer statistics from all cancer patients in Norway collected 
over the last roughly 5 decades. 


3.4 Genomic Privacy 


Engaging with a community of researchers in bioinformatics and biostatistics who 
were concerned with patient privacy issues led to a growing interdisciplinary com- 
munity interested in the development of a range of cryptographic techniques to apply 
to privacy problems in the health and biological sciences arenas [18]. One measure 
of the growth of this community over the last five years has been participation in 
the ¡DASH Secure Genome Analysis Competition, a series of annual international 
competitions funded by the National Institutes of Health (NIH) in the U.S. The 
¡DASH competition has included a track on Homomorphic Encryption for the last 
five years 2015-2019, and our team from MSR submitted winning solutions for the 
competition in 2015 ([27]) and 2016 ([10]). The tasks were: chi-square test, mod- 
ified edit distance, database search, training logistic regression models, genotype 
imputation. Each year, roughly 5-10 teams from research groups around the world 
submitted solutions for the task, which were bench-marked by the ¡DASH team. 
These results provide the biological data science community and NIH with real and 
evolving measures of the performance and capability of homomorphic encryption to 
protect the privacy of genomic data sets while in use. Summaries of the competitions 
are published in [38, 40]. 


3.5 Machine Learning: Training and Prediction 


The 2013 “ML Confidential” paper [23] was the first to propose training ML algo- 
rithms on homomorphically encrypted data and to show initial performance numbers 
for simple models such as linear means classifiers and gradient descent. Training is 
inherently challenging because of the large and unknown amount of data to be pro- 
cessed. 

Prediction tasks on the other hand, process an input and model of known size, so 
many can be processed efficiently. For example, in 2016 we developed a demo using 
SEAL to predict the flowering time for a flower. The model processed 200, 000 SNPs 
from the genome of the flower, and evaluated a Fast Linear Mixed Model (LMM). 
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Including the round-trip communication time with the cloud running the demo as a 
service in Azure, the prediction was obtained in under a second. 

Another demo developed in 2016 using SEAL predicted the mortality risk for 
pneumonia patients based on 46 characteristics from the medical record for the 
patient. The model in this case is an example of an intelligible model and consists 
of 46° 4 polynomials to be evaluated on the patient’s data. Data from 4, 096 patients 
can be batched together, and the prediction for all 4, 096 patients was returned by 
the cloud service in a few seconds (in 2016). 

These two demos evaluated models which were represented by shallow circuits, 
linear in the first case and degree 4 in the second case. Other models such as deep 
neural nets (DNNs) are inherently more challenging because the circuits are so deep. 
To enable efficient solutions for such tasks requires a blend of cryptography and 
ML research, aimed at designing and testing ways to process data which allow for 
efficient operations on encrypted data while maintaining accuracy. An example of 
that was introduced in CryptoNets [22], showing that the activation function in the 
layers of the neural nets can be approximated with a low-degree polynomial function 
(x?) without significant loss of accuracy. 

The CryptoNets paper was the first to show the evaluation of a neural net pre- 
dictions on encrypted data, and used the techniques introduced there to classify 
hand-written digits from the MNIST [31] data set. Many teams have since worked 
on improving the performance of CryptoNets, either with hybrid schemes or other 
optimizations [17, 25, 35]. In 2018, in collaboration with Median Technologies, 
we demonstrated deep neural net predictions for a medical image recognition task: 
classification of liver tumors based on medical images. 

Returning to the challenge of training ML algorithms, the 2017 iDASH contest 
task required the teams to train a logistic regression model on encrypted data. The 
data set provided for the competition was very simple and did not require many 
iterations to train an effective model (the winning solution used only 7 iterations [26, 
28]). The MSR solution [12] computed over 300 iterations and was fully scalable 
to any arbitrary number of iterations. We also applied our solution to a simplified 
version of the MNIST data set to demonstrate the performance numbers. 

Performance numbers for all computations described here were published at the 
time of discovery. They would need to be updated now with the latest version of 
SEAL, or can be estimated. Hardware acceleration techniques using state-of-the-art 
FPGAs can be used to improve the performance further ([34]). 


4 How Do We Assess Security? 


The security of all homomorphic encryption schemes described in this article is based 
on the mathematics of lattice-based cryptography, and the hardness of well-known 
lattice problems in high dimensions, problems which have been studied for more than 
25 years. Compare this to the age of other public key systems such as RSA (1975) 
or Elliptic Curve Cryptography ECC (1985). Cryptographic applications of Lattice- 
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based Cryptography were first proposed by Hoffstein, Pipher, and Silverman [24] 
in 1996 and led them to launch the company NTRU. New hard problems such as 
LWE were proposed in the period of 2004—2010, but were reduced to older problems 
which had been studied already for several decades: the Approximate Shortest Vector 
Problem (SVP) and Bounded Distance Decoding. 

The best known algorithms for attacking the Shortest Vector Problem or the Clos- 
est Vector Problem are called lattice basis reduction algorithms, and they have a more 
than 30-year history, including the LLL algorithm [32]. LLL runs in polynomial time, 
but only finds an exponentially bad approximation to the shortest vector. More recent 
improvements, such as BKZ 2.0 [13], involve exponential algorithms such as sieving 
and enumeration. Hard Lattice Challenges were created by TU Darmstadt and are 
publicly available online for anyone to try to attack and solve hard lattice problems 
of larger and larger size for the record. 

Homomorphic Encryption scheme parameters are set such that the best known 
attacks take exponential time (exponential in the dimension of the lattice, n, meaning 
roughly 2” time). These schemes have the advantage that there are no known polyno- 
mial time quantum attacks, which means they are good candidates for Post-Quantum 
Cryptography (PQC) in the ongoing 5-year NIST PQC competition. 

Lattice-based cryptography is currently under consideration for standardization in 
the ongoing NIST PQC Post-Quantum Cryptography competition. Most Homomor- 
phic Encryption deployments use small secrets as an optimization, so it is important to 
understand the concrete security when sampling the secret from a non-uniform, small 
distribution. There are numerous heuristics used to estimate the running time and 
quality of lattice reduction algorithms such as BKZ2.0. The Homomorphic Encryp- 
tion Standard recommends parameters based on the heuristic running time of the 
best known attacks, as estimated in the online LWE Estimator [2]. 


5 Conclusion 


Homomorphic Encryption is a technology which allows meaningful computation on 
encrypted data, and provides a tool to protect privacy of data in use. A primary appli- 
cation of Homomorphic Encryption is secure and confidential outsourced storage 
and computation in the cloud (i.e. a data center). A client encrypts their data locally, 
and stores their encryption key(s) locally, then uploads it to the cloud for long-term 
storage and analysis. The cloud processes the encrypted data without decrypting it, 
and returns encrypted answers to the client for decryption. The cloud learns nothing 
about the data other than the size of the encrypted data and the size of the computa- 
tion. The cloud can process Machine Learning or Artificial Intelligence (ML or AI) 
computations, either to make predictions based on known models or to train new 
models, while preserving the client’s privacy. 

Current solutions for HE are implemented in 5-6 major open source libraries 
world-wide. The Homomorphic Encryption Standard [1] for using HE securely was 
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approved in 2018 by HomomorphicEncryption.org, an international consortium of 
researchers in industry, government, and academia. 

Today, applied Homomorphic Encryption remains an exciting direction in cryp- 
tography research. Several big and small companies, government contractors, and 
academic research groups are enthusiastic about the possibilities of this technol- 
ogy. With new algorithmic improvements, new schemes, an improved understanding 
of concrete use-cases, and an active standardization effort, wide-scale deployment 
of homomorphic encryption seems possible within the next 2-5 years. Small-scale 
deployment is already happening. 

Computational performance, memory overhead, and the limited set of operations 
available in most libraries remain the main challenges. Most homomorphic encryp- 
tion schemes are inherently parallelizable, which is important to take advantage of to 
achieve good performance. Thus, easily parallelizable arithmetic computations seem 
to be the most amenable to homomorphic encryption at this time and it seems plau- 
sible that initial wide-scale deployment may be in applications of Machine Learning 
to enable Private AI. 
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Mathematical Approaches A) 
for Contemporary Materials Science: ds 
Addressing Defects in the Microstructure 


Claude Le Bris 


Abstract We overview a series of mathematical works that introduce new modeling 
and computational approaches for non-periodic materials and media. The approaches 
consider various types of defects embedded in a periodic structure, which can be 
either deterministic or random in nature. A portfolio of possible computational tech- 
niques addressing the identification of the homogenized properties of the material or 
the determination of the actual multi-scale solution is presented. 


1 Introduction 


1.1 Contemporary Materials Science 


The works outlined in the present review have been motivated by the following 
two-fold observation. In the past couple of decades, what we believe to be the most 
spectacular changes in materials science are 


(i) the increasing multi-scale nature of the materials considered: materials used 
to be mostly considered at one single scale, the effect of the finer scales being 
only phenomenologically accounted for in the model at the largest scale; when 
absolutely necessary, the effect of some micro-scale structure was explicitly con- 
sidered, but then it was at most for one such scale and almost exclusively sequen- 
tially: information was passed from the micro-scale to the macro-scale; modern 
materials science increasingly explicitly and concurrently considers models of 
a given material at many different scales. 

(ii) the increasing imperfect character of the materials considered: more and 
more often, deterministic or random sources of disorder are considered within an 
ordered phase: the simplicity of periodic structures is not a valid approximation 
any longer for the degree of practical relevance and accuracy that modern mate- 
rials science requires; crystalline materials are actually polycrystalline materials 
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and consist of mono-crystalline grains, each of them possibly of a different crys- 
talline structure, each crystalline structure being itself flawed because sprinkled 
of defects and dislocations; the imperfections, or violations of periodicity, affect 
every possible scale, and actually cut through scales. 


As aresult, the real materials that contemporary materials scientists have to model 
have a multi-scale, imperfect, possibly random nature. Such materials have several 
characteristic length-scales that possibly differ from one another by orders of mag- 
nitude but must be accounted for simultaneously. At possibly each such scale, they 
have defects. Their qualitative and quantitative response might therefore differ a lot 
from the idealized scenario long considered. 


Our intent here is to present several mathematical and numerical endeavors that 
aim to better model, understand and simulate non-periodic multi-scale problems. 


The specific theoretical context in which we develop our discussion is homoge- 
nization of simple, second order elliptic equations in divergence form with highly 
oscillatory coefficients: 

— div [Ac(x)Vu*] =f, (1) 


in a domain D C Rf, with, say, homogeneous Dirichlet boundary conditions u^ — 0 
on 0D. This particular case is to be thought of as a prototypical case. It is intuitively 
clear that the same approaches carry over to other settings. Current works are indeed 
directed toward extending many of the considerations here to other types of equations, 
as will be clear in the exposition below. 

We conclude this introductory section with a quick presentation of the classical 
theory. The reader familiar with this theory may of course skip the presentation and 
directly proceed to Sect. 2. 


1.2 Basics of Homogenization Theory 


1.2.1 Periodic Homogenization 


To begin with, we recall some well known, basic ingredients of elliptic homogeniza- 
tion theory in the periodic setting, see the classical references [8, 29, 42] for more 
details, or an overview in [1, Chap. 1] . We consider the problem 
—div [A per (i) Vu*] — f in D, Q) 
u? —0 on JOD, 
where the matrix A per is 74 -periodic, bounded and bounded away from zero, and 
(for simplicity) symmetric. The corrector problem associated to Eq. 2 reads, for p 
fixed in R4, 
(3) 


—div (A per (y) (P + Vwper.p)) = 0. 
W per,p 18 Z4 -periodic. 
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It has a unique solution up to the addition of a constant. This solution is meant to 
describe prototypical fine oscillations of the exact solution u* for e small. Then, the 
homogenized coefficients read 


aya I el Aper (y) (ej + Vwpere,(y)) dy, (4) 
Q 


where O is the unit cube and e;, 1 <i < d are the canonical vectors of R^. The 
main result of periodic homogenization theory for Eq.2 is that, as e vanishes, the 
solution u^ to Eq.2 converges to u* solution to 


per 


u* —0 on OD. 


pa Vu*]- f in D, a 


The convergence holds in L? (D), and weakly in Hg (D). The correctors w per,e; may 
then also be used to “correct” u* in order to show that, in the strong topology H! (D), 


d 
u* — u*! (x) converges to zero, foru*! (x) = u*(x) + € 3:4 ds di (xX) seu (8/2). 
The rate of convergence may also be made precise. 


The practical conclusion is that, at the price of only computing the d periodic 
problems of Eq.3, the solution to Eq.2 can be efficiently approached for ¢ small. 


1.2.2 Random Homogenization 


A first option to outreach the simplistic setting of periodic structures is to consider 
random structures. Of course, materials are never random in nature, but randomness 
is a suitable, practical way to encode the ignorance of, or at best the uncertainty on 
the intimate microscopic structure of the material considered. 

For homogenization, the random setting is a highly non trivial extension of the 
periodic setting. Many questions, in particular for nonlinear equations, still remain 
open in the random case although they are solved and well documented in the periodic 
case. Fortunately, in the case of linear diffusion equations such as Eq. 1, the state of 
affairs is that, loosely speaking, all the results of convergence still essentially hold 
true but (a) they are more difficult to prove and (b) the convergence rates are even 
more difficult to establish. 

To fix the ideas, we now give some more formal details on one random case. For 
brevity, we skip all technicalities related to the definition of the probabilistic setting, 
which we assume discrete stationary and ergodic (we refer e.g. to [2] for all details). 
We now fix A(., w) a square matrix of size d, again bounded and bounded away from 
Zero, symmetric, which is assumed stationary in the sense 


Yk € Zf, A(x + k, w) = A(x, Tæ) almost everywhere in x, almost surely (6) 
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(where r is an ergodic group action). This amounts to assuming that the law of 
AC, œ) is Z4-periodic. Then we consider the boundary value problem 

—div (A (ž, œ) Vu’) = f in D, (7) 
u? —0 on 9D. 


Standard results of random homogenization [8, 29] apply and allow to find the 
homogenized problem for Eq. 7. These results generalize the periodic results recalled 
in Sect. 1.2.1. The solution u* to Eq. 7 converges to the solution to Eq. 5 where the 
homogenized matrix is now defined as: 


[A*]y =E f e A (y, 9 (ej + Vure,(y, 9) dy | . (8) 
Q 


where for any p € R4, Wp is the solution (unique up to the addition of a random 
constant) to 


—div [A (y, w) (p+ Vwp(y, 0))] 2 0, a.s.on R4, 
Vwy is stationary in the sense of Eq. 6, 


(9) 


y f v6 = 0. 
Q 


A striking difference between the random setting and the periodic setting can be 
observed comparing Eqs.3 and 9. In the periodic case, the corrector problem is 
posed on a bounded domain, namely the periodic cell Q. In sharp contrast, the cor- 
rector problem in Eq.9 of the random case is posed on the whole space IR^, and 
cannot be reduced, at the theoretical level, to a problem posed on a bounded domain. 
The fact that the random corrector problem is posed on the entire space has far reach- 
ing consequences both for theory and for numerical practice. To some extent, the 
unboundedness of the domain on which the corrector problem is posed is a com- 
mon denominator of all the settings that we will address in the present survey. This 
unboundedness of the corrector problem is also a fundamental characteristic fea- 
ture of the practically relevant problems of materials science. We cannot emphasize 
enough this fact. 

In order to approximate Eq. 9 numerically, truncations of the problem have to be 
considered, typically on large domains Qy = [0, N]? and using periodic boundary 
conditions. The actual homogenized coefficients are only captured in the asymptotic 
regime Qy > R^. Overall, it is fair to consider that the approach is very expen- 
sive computationally, and often actually prohibitively expensive. Therefore, in many 
practical situations, the size of the “large” domain O y considered is in fact small, 
and the number of realizations of the random microstructure considered therein to 
approach the expectation in Eq. 8 is also dramatically limited. Put differently, there 
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is a large gap looming between the actual practice and the regime where the theory 
provides relevant information. 

Important theoretical questions about the quality and the rate of the convergence in 
terms of the truncation size arise: see, in particular, the pioneering works by Bourgeat 
and Piatnitski [17, 18] and, more broadly and recently, a series of works by F. Otto, 
A. Gloria, S. Armstrong, Ch. Smart, J.-C. Mourrat and their many collaborators, see 
e.g. [25, 26] for examples of contributions. 


2 A Mathematical Toolbox for “Weakly” Random 
Problems 


We begin with this section our study of homogenization of non-periodic problems. We 
have already mentioned that one possible option is the random setting. And we have 
mentioned the practical difficulties it raises. In many practical situations, however, 
the real material under consideration is not far from being a periodic material. At 
zero-th order of approximation, the material can be considered periodic, and it is 
only at a higher order that disorder might play a role. We choose, in this section, 
to encode this disorder using randomness. When the “material” under study is the 
geological bedrock, there is of course no reason for this assumption to be valid, and 
the classical random model of Sect. 1.2.2 might be more relevant. In contrast, the 
assumption makes a lot of sense when considering manufactured materials, where 
the defect of periodicity typically owes to flaws in the process: the material was meant 
to be periodic, but it is actually not. The practically relevant question is to understand 
whether or not, despite its smallness, the microscopic amount of randomness might 
affect the macroscale at order one. Solving this question requires to come up with a 
modeling strategy for the imperfect material. 

Our purpose here is to outline a modeling strategy that accounts for the presence 
of randomness in a multi-scale computation, but specifically addresses the case when 
the amount of randomness present in the system is small. In this case, we call the 
material weakly random. The weakly random material is thus considered as a small 
perturbation of a periodic material. Our purpose is to introduce a toolbox of possible 
modeling strategies that all keep the computational workload limited (in comparison 
to a direct attack of the problem as if, like in Sect. 1.2.2, the randomness was not 
small) and that provides an approximation of the response of the material which one 
may certify by error estimates. 

As mentioned above, the simple diffusion equation Eq. | is a perfect prototypical 
testbed for our toolbox. It is ubiquitous in several, if not all engineering sciences 
and life sciences. Although we have not developed our theory and computations for 
other, more general equations and settings, we are convinced that the same line of 
approach (namely small amount of randomness as compared to a reference periodic 
setting, plus expansion in the randomness amplitude, and simplified computations) 
can be useful in many contexts. 
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2.1 Random Deformations of the Periodic Setting 


A first random setting, which has been introduced and studied in [11] and is not, math- 
ematically, a particular case of the classical stationary setting recalled in Sect. 1.2.2, 
consists of random deformations of a periodic structure. As said above, it is motivated 
by the consideration of random geometries that have some specific proximity to the 
periodic setting. The periodic setting is here taken as a reference configuration, some- 
what similarly to the classical mathematical formalization of continuum mechanics 
where a reference configuration is used to define the state of the material under study. 
Another related idea, in a completely different context, is the consideration of a ref- 
erence element for finite element computations. The real situation is then seen via 
a mapping from the reference configuration to the actual configuration. Here, this 
mapping is a random mapping (otherwise, one would know everything on the mate- 
rial up to a change of coordinates and there would be poor practical interest in the 
approach). Assuming some regularity of this mapping induces constraints on the sets 
of geometries that the microstructures of the material can take. Put differently, the 
material structure, even though it is not entirely known, is not arbitrarily disordered. 

We fix some Z* -periodic A per» assumed to satisfy the usual properties of bounded- 
ness and coerciveness, and we consider the following specific form of the coefficient 
A, in Eq. 1 


Ae (x, w) = A per (o (5. o)) , (10) 
E 
where the function ®(-, w) is assumed to be, almost surely, a diffeomorphism from R¢ 


to IR, The diffeomorphism, called a random stationary diffeomorphism, is assumed 
to additionally satisfy 


essinf,co. «eg [det(V P(x, w))] = v > 0, (11) 
eSsSSUp,eo. reri (IV (x, w)|) = M < oo, (12) 
VO(x,w) is stationary in the sense of Eq. 6. (13) 


Note that the first two assumptions enforce the “homogeneity” of the diffeomorphism: 
the deformed periodic structure does not implode nor explode anywhere. 

Homogenization holds for the above problem (the details are made precise in [11 ]). 
The homogenized problem again reads as in Eq. 5 with the homogenized matrix given 
by: 


-1 


[A*]ij — det D [ vec. -)dz 
Q 


x E / e A per (i, -)) (e; + Vwe, O, )) dy , (14) 
P(0,:) 
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where for any p € RY, wy is the solution (unique up to the addition of a random 
constant and belonging to the suitable functional space) to 


—div [A per (67 (y, @)) (p+ Vwp)] =0, a.s.on R4, 
wp(y, 0) = Üp (9 (y, 0), 0), Vip is stationary in the sense of Eq. 6, 


y if Vup(y. )dy | = 0. 
P(Q.) 

(15) 
At first sight, there seems to be no simplification whatsoever in considering the 
above system Eq. 15, which even looks way more complex than the classical random 
problem Eq.9. The key point, though, is that the introduction of a new modeling 
“parameter”, namely the random diffeomorphism ®, allows to in some sense intro- 
duce a distance between the periodic case (® = Id) and the random case (® Z Id) 
considered. Our next step consists in proceeding in this direction. 


2.2 Small Random Perturbations of the Periodic Setting 


We now superimpose to the setting defined in the previous section the assumption 
that the material considered is a small perturbation of a periodic material. This is 
formalized upon writing 


D(x, o) =x +9 V(x, w) + O(n), (16) 


where Y is any random field such that ® is a random stationary diffeomorphism that 
satisfies Eqs. 11-13 for y sufficiently small. 

It has been shown in [11] that, when ® is such a perturbation of the identity 
map (see Fig. 1), the solution to the corrector problem of Eq. 15 may be developed 
in powers of the small parameter 7. It reads Wp(x, 0) = Wper,p(X) + y (a, w) + 


O m, where W per, is the periodic corrector defined in Eq.3 and where w solves 


Fig. 1 Small random 
deformation of a periodic 
structure. In the unperturbed 
periodic environment, the 
inclusions are circular and 
periodic. The deformation of 
each inclusion is performed 
randomly. Source [21] 
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—div [A per Vw ] 
= div [—A per VY VW per, p — (VW? — (div W)Id) A per (p + Vwper.p)] » 


ts ; * 1]. 
V wy is stationary and E [v = 0. 


Q 
(17) 


l is random in nature, but it is in fact easy to see, taking 


p 
the expectation, that Wp = E(wp) is periodic and solves the deterministic problem 


The problem of Eq. 17 in w 


—div [A per Voy] 
= div [—A per (VW) VW per,p -( ivy) — E(div W)Id) A per (p+ Y peo) |] . 


This is useful because, on the other hand, the knowledge of w9 and Wp suffices to 


obtain a first order expansion (in n) of the homogenized matrix. Indeed, A”, being 
the periodic homogenized tensor as defined in Eq. 4, and 


Al = -f (div Y) [Aj lu f (6 + Viere) Aper e; E(div Y) 


Q Q 
«f (Vw! — EVY) Vw? oe) Aper ej; 
Q 
we then have 
A* = A* „+ nA! + O(n’). (18) 


per 


For 7 sufficiently small in function of the accuracy expected, the approach therefore 
provides a computational strategy to approximately compute the homogenized ten- 
sor that bypasses the classical random problem and only considers (a sequence of) 
deterministic, periodic problems. 


2.3 Rare but Possibly Large Random Perturbations 


The previous section has shown that a perturbative approach can be an interesting 
modeling and computational strategy for cases when the structure of the material is 
random but “close” to a periodic structure. We now proceed in a similar direction 
by presenting an alternative perturbative approach, described in full details in [3, 4]. 
We consider 

Ay (x, 0) = Aper x) + by (x, w) C per (x), (19) 


instead of a coefficient A per (o^ C, w)) with ® of the form Eq. 16. In Eq. 19, A per is 
again a periodic matrix modeling the unperturbed material, C per is a periodic matrix 
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Fig. 2 Defects in a periodic 
structure. In the unperturbed 
periodic environment, the 
inclusions are periodic. The 
elimination of some of these 
inclusions are the defects 
considered. The elimination 
may be deterministic (as in 
Sect. 3 below), or random (as 
in Sect. 1.2.2). One may also 
consider small probabilities 
of elimination and construct 
the corresponding 
mathematical setting (as in 
Sect. 2.3). Source [3] 


modeling the perturbation, and b, (., w) is a random field that is, in some sense, small. 
Consider then the case 


by (x, œ) = > 1,044) (x) B7 (o), (20) 
kezi 


where the B; are, say, independent identically distributed random variables. One 
particularly interesting case (see [3, 4] for this case and others) is that when the 
common law of the B; is a Bernoulli law of parameter r (see Fig. 2). 

We now explain formally our approach. The mathematical correctness of the 
approach has been established in the works [23, 40]. 

To start with, we notice that in the corrector problem 


— div [ A, (y, 0) (p + Vwp(y, »))] = 0, (21) 


the only source of randomness comes from the coefficient A, (y, œ). Therefore, in 
principle, if one knows the law of this coefficient A}, one knows the law of the correc- 
tor function wp(y, œ) and therefore may compute the homogenized coefficient A*, 
the latter being a function of this law. When the law of A, is an expansion in terms 
of a small coefficient, so is the law of wy. Consequently, A5 must be attainable using 
an expansion. 

Heuristically, on the cube O y and at order 1 in n, the probability to see the perfect 
periodic material (entirely modeled by the matrix A per) is (1 — yy" ^m1—-N dy + 
O(n’), while the probability to see the unperturbed material on all cells except one 
(where the material has matrix A per + Cper) 18 N! (1 — my = N4n + O(n’). 
All other configurations, with more than two cells perturbed, contribute at orders 
higher than or equal to 7”. This gives the intuition (indeed confirmed by a mathemat- 
ical proof) that the first order correction indeed comes from the difference between 
the material perfectly periodic except on one cell and the perfect material itself: 
A; = A5 + NAi + 0(7) where A5,, is the homogenized matrix for the unper- 


per per 
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turbed periodic material and 


Alre = Him I [(Aper + LoCper MV We + ei) — Aper(VWpere; + €:)], (22) 
—>+00 
Qn 


where wẸ solves 


— div ((Aper + 19Cper)(e; + Vw¿)) 20 in Oy, we is Oy — periodic. 
(23) 
Note that the integral appearing on the right-hand side of Eq. 22 is not normalized: it a 
priori scales as the volume N“ of Q y and has finite limit only because of cancellation 
effects between the two terms in the integrand. 

This perturbative approach has been extensively tested. It has been observed that 
the large N limit for cubes of size N is already accurately approximated for limited 
values of N. As in the previous section (Sect. 2.2), the computational efficiency of 
the approach is clear: solving the two periodic problems with coefficients A per and 
A per +1 QC per for a limited size N is much less expensive than solving the original, 
random corrector problem for a much larger size N. When the second order term 
is needed, configurations with two defects have to be computed. They all can be 
seen as a family of PDEs, parameterized by the geometrical location of the defects 
(see again Fig. 2). Reduced basis techniques have been shown to allow for a definite 
speed-up in the computation, see [33]. 

On an abstract level, we note that, in the proposed approach for the “weakly” ran- 
dom regime, the determination of the homogenized tensor for a material containing 
defects with random locations is reduced to a set of computation of the solutions 
to correctors problems such as Eq.23 for materials with defects at some particular 
deterministic locations. This naturally establishes a methodological link with our 
next section where we indeed consider materials with deterministic defects. The link 
is actually more than methodological: the theoretical results of Sect.3 establishing 
that the corrector problems with deterministic defects are uniquely solvable in a suit- 
able class of functions are readily useful in the random setting for the foundation of 
the approach described here in Sect. 2. 


3 Deterministic Defects Within an Otherwise Periodic 
Structure 


We return to the generic multi-scale diffusion equation Eq. 1. Under quite general 
and mild assumptions on the diffusion (possibly matrix-valued) coefficient A, (which 
needs not be of the form A, = A per (x/&) or obey any structural assumption of that 
type), presumably varying at the tiny scale e, the equation admits an homogenized 
limit, which is indeed of the same form as Eq. 1, namely Eq.5. Celebrated results 
along these lines are due to S. Spagnolo, E. De Giorgi and L. Tartar and their respec- 
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Fig. 3 Localized defects in a periodic structure. Some periodic cells in the center of the domain 
are perturbed. The error uê — u^! is displayed when calculating u^! using (left) the periodic 
corrector Y per,p Solution to Eq.3 and (right) the adjusted corrector wp solution to Eq. 24. In the 
former case, the size of the committed error is almost a “defect detector”. In the latter case, the 
error is homogeneous throughout the domain, recovering the quality of the approximation of the 
unperturbed periodic case. Source [12] 


tive collaborators, see [42]. The strength of such results is their generality. They 
are obtained by a compactness argument. Schematically the sequence of inverse 
operators [—div(A, V.)] ! is (weakly) compact in the suitable topology, converges, 
up to an extraction, and its limit can be proven to be an operator of the same type, 
namely [—div(A*V.)] !. On the other hand, and precisely because of the generality, 
not much is known on the limit A*. This contrasts with periodic homogenization 
which is both explicit (the limit coefficient A* is known by a formula, namely Eq. 4, 
in function of the, also known, corrector) and precised (the rate of convergence of 
u* to u* is known for a large variety of norms). Besides their theoretical interest 
per se, the combined two ingredients allow for envisioning, in practice, a numerical 
approach for the computation of the homogenized limit, certified by a numerical 
analysis that guarantees a control of the numerical error committed, in function of e 
and the discretization parameters. 

The question arises to find settings sufficiently general that still allow for the 
quality of results of the periodic setting. The recent decade has witnessed several 
mathematical endeavors in this direction. We describe here such an endeavor and 
give one prototypical example of such a setting, where we illustrate the novelty of 
the mathematical questions involved (Fig. 3). 

Consider Eq. 1 and assume that A, = A(./£) where the coefficient A models a 
periodic material perturbed by a localized defect. This setting, mathematically, may 
be encoded in A = A per + A for A € LP (R^) for some p < +00. Clearly, the pres- 
ence of this defect does not affect the macroscopic behavior, that is the homogenized 
equation for the same homogenized coefficient A*, only actually depending on aver- 
ages of A over large, asymptotically infinite volumes, for which the addition of a 
function such as A does not matter. On the other hand, when it comes to making this 
limit more precise, one intuitively realizes, zooming in locally in the material, that 
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the corrector equation that describes the microscopic response of the material reads 
as 
— div(A(e; + Vwe,)) = 0. (24) 


This equation is different from Eq.3, and, in sharp contrast with Eq.3 (and simi- 
larly to what we observed for Eq.9 in the random setting), does not reduce to an 
equation set on a bounded domain with periodic boundary conditions. Note that, 
for the particular choice A = lo0C per» Eq. 23 is a particular instance of Eq.24 when 
N = +00. In essence, Eq. 24 is posed on the entire ambient space IR“, a reflection 
of the fact that, at the microscopic scale, the defect has broken the periodicity of the 
environment: the local response is affected by the defect and depends on the state 
of the whole microscopic structure. A considerable mathematical difficulty follows. 
The classical toolbox for the study of the well-posedness of (here linear) equations 
on bounded domains: the Lax-Milgram Lemma in the coercive case, the Fredholm 
Alternative, etc., all techniques that one way or another rely upon the boundedness of 
the domain or the compactness of the setting, are now ineffective. Should A be ran- 
dom stationary, then Eq. 24 would read as Eq. 9 and admit an equivalent formulation 
on the abstract probability space. This would make up for compactness, but other 
significant complications would arise. For Eq. 24, the difficulty must be embraced. 
A related difficulty is to define the set of admissible functions for solutions, or the 
variational space in an energetic formulation of the problem. In the specific case 
A — Aper + A with A € L? (R^), one seeks for the solution to Eq. 24 under the form 
We; = Wper,e; + We, that is, with reference to the periodic solution w pe. e, , somewhat 
in echo to what we achieved in Sect.2.3. Equation 24 then rewrites as 


—div (A Vis) = div (f), 


where f e L"(R^), which, by homogeneity, suggests that the suitable functional 
space for Vw is L'(IR^). The question then arises to know whether the oper- 
ator [V][div(A V .)]7! [div] acts continuously in L?(R¢). The answer depends 
on the properties of the coefficient A. In the present setting, it is positive for 
all 1 < p < +00. The theoretical analysis to reach this conclusion heavily relies 
upon the celebrated works [5-7] by M. Avellaneda and F. H. Lin for the periodic 
case (see also [30, 41]). 

The consideration of the one-dimensional version of the problem clearly shows 
(this particular example is worked out in [12]) that when one considers the spe- 


cific corrector w solution to == (aper FAO) |1+ T v) = 0, instead 
y y 


d 
of the periodic corrector W per solution to — 3 (« per) {1+ W per 0) =0, 
y 


d 
dy 
then the quality of the (two-scale, first order) approximation of the solution u* is 
immediately improved near the defect and at the scale of the defect. 

In dimensions higher than or equal to two, the proof is more difficult. Under 
appropriate conditions, the solution u* is well approximated in H! norm, both at 


scale one and at scale e (thus in particular in L% norm), by the first order expansion 
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d 

u* (x) 2 u*(x) +e bM i 3x u* (x) We, (x/£) constructed using the specific correc- 
i= 

tors We,. The latter approximation property does not in general hold true for the 


periodic first-order approximation Ue (x) =u*(x) +e »» Ox, u* (x) Wper,e, (x/8) 
constructed using the periodic corrector w per, e, One may even make precise the rate 
of convergence in function of the small parameter e, and likewise may prove similar 
convergence for different Sobolev or Hólder norms. The proof of these convergences 
has first been presented in the case p = 2 (and slightly formally) in [12]. All results 
and extensions are carried out in a series of works [9, 10, 13-15]. 

The procedure above is not restricted to the linear diffusion problem Eq. 1. One 
may consider semi-linear equations, quasi-linear equations, systems, etc. And of 
course it gets all the more delicate as the complexity of the equation increases. One 
such example, namely an Hamilton-Jacobi equation, is the purpose of the work [19] 
and also the subject of work in progress by the author and his collaborators, see [16, 
20, 28]. 

Various other cases of defects may be considered for homogenization problems 
that are otherwise "simple". They may formally decay at infinity (like the “localized” 
functions A manipulated above), or not. In the former case, the problem at infinity 
(that is the problem obtained upon translating the equation far away from the defect) 
is identical to the underlying periodic problem. In the latter case, the situation may 
sensitively depend upon what the problem “at infinity" looks like. There may even 
exist several such problems. Another prototypical example is related to the modeling 
of grain boundaries in materials science: two, different, periodic structures are con- 
nected across an interface. The defect is, say, a plane separating the two structures, 
and at large distances from this interface, different periodic structures are present, 
depending upon which side of the interface is considered, see [13]. The correspond- 
ing mathematical problem is theoretically challenging, and practically relevant. In 
all cases, the purpose is to identify the homogenized, macroscopic limit, while, in the 
meantime, retain some of the microscopic features that make the problem relevant. 


4 Multi-scale Finite Element Approaches 
and Nonperiodicity 


Multi-scale Finite Element Methods, abbreviated as MsFEM, have proved to be effi- 
cient in a number of contexts. In essence, these approaches are based upon choosing, 
as specific finite dimensional basis to expand the numerical solution upon, a set 
of functions that themselves are solutions to a highly oscillatory local problem, at 
scale e, involving the differential operator present in the original equation. This 
problem-dependent basis set, precomputed (in an offline stage), is likely to better 
encode the fine-scale oscillations of the solution and therefore allow to capture the 
solution more accurately. Numerical observation along with mathematical arguments 
prove that this is indeed generically the case. The versatility of the classical FEM is 
lost, but with MsFEM, their efficiency is restored for multi-scale problems. 
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The standard version of the approach has been originally introduced by T. Hou 
and his collaborators (see the textbook [24] for a general introduction). There exist 
many variants of such a multi-scale approach, within the formalism of MsFEM 
or beyond it, and many outstanding numerical analysts and computational analysts 
have contributed to the field. Classical examples include the Variational multi-scale 
Method introduced by Hughes et al. the Local Orthogonal Decomposition method 
by Malqvist and Peterseim, the localization and subspace decomposition method of 
R. Kornhuber and H. Yserentant, etc. It is not our purpose here to review all these 
works. We would like to concentrate ourselves here on an issue that is intrinsically 
related to the context of our discussion, namely breakings of the periodic structure of 
a material, and its consequence on the accuracy of a dedicated numerical approach. 

We recall, on the prototypical multi-scale diffusion problem Eq. 1, that the MsFEM 
approach, in one of its simplest variant, consists of the following three steps: 


1. Introduce a discretization of D with a coarse mesh; throughout this article, we 
work with the P! Finite Element space 


Vi = Span {¢?, 1 <i < Ny,} C Hi (D). (25) 
2. Solve the local problems (one for each basis function for the coarse mesh) 
— div (a. vut) —0 inK, wi =¢° ondK, (26) 


on each element K of the coarse mesh 77, in order to build the multi-scale basis 
functions. This is typically performed off-line, using a fine mesh 7}, with h << H. 
3. Apply a standard Galerkin approximation of Eq. 1 on the space 


Span { yf, 1 <i € Ny,} C Ha (D), (27) 


where y7 is such that y? le = ye for all K € Ty. 


The error analysis of this MsFEM method has been performed for A, = Aper (-/€) 
with Aper a fixed periodic matrix. Assuming that the basis functions are perfectly 
determined (that is, = 0), the main error estimate, under the usual assumption of 
regularity of the data and the mesh, reads as 


Fe 
lu^ — uylla s € (u + Ye + =). (28) 


where C is a constant independent of H and e. 

When the coarse mesh size H is close to the scale e, a so-called resonance phe- 
nomenon, encoded in the term ./é/H in Eq. 28, occurs and deteriorates the numerical 
solution. The oversampling method is a popular technique to reduce this effect. In 
short, the approach, which is non-conforming, consists in setting each local problem 
on a domain slightly larger than the actual element K considered, so as to become 
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less sensitive to the arbitrary choice of boundary conditions on that larger domain, 
and next truncate on the element the functions obtained. That approach allows to 
significantly improve the results compared to using linear boundary conditions as 
in Eq. 26. In the periodic case, the following estimate holds 


€ 
lu’ — unco S C(H Ve). 


where lu? — ull HTa) = P» llu* — wl eri ac, is the H! broken norm of 
€4H 


E E 
u — ug. 


The boundary conditions imposed on 0K in Eq. 26 are the so-called linear bound- 
ary conditions. Besides the linear boundary conditions, and the oversampling tech- 
nique we have just mentioned, there are many other possible boundary conditions for 
the local problems. They may give rise to conforming, or non-conforming approx- 
imations. The choice sensitively affects the overall accuracy. In an informal way, 
the whole history of improvements of the original version of MsFEM can be revis- 
ited as the history of improvements of the choice of suitable “boundary conditions” 
for Eq. 26. 

The question of how much the choice of boundary conditions for the local prob- 
lems Eq. 26 alters the overall accuracy is all the more crucial in the context of non- 
periodic structures. A prototypical case of the difficulty is that of perforated materi- 
als. Consider the Poisson problem set on a domain with perforations of size e. For 
a generic mesh, the edges (or, alternately, the facets in a three-dimensional setting) 
of the mesh may intersect the perforations. It is intuitive that difficulties then arise 
since the (linear or else than linear) postulated behavior of the basis functions along 
the edges has little chance to accurately capture the actual behavior of the exact 
solution, given the perforations. Of course, one may use oversampling in order to 
circumvent this difficulty, but then the approach is non conformal and other difficul- 
ties arise, besides the increased computational cost. Also, one may consider meshing 
the domain in such a way that the edges intersect as few perforations as possible. 
For a periodic array of perforations, this is a decent solution. But in a non-periodic 
setting, and this is all the more true in a fully disordered array of perforations, this is 
impractical. A possible option introduced in [34], and extended in [35, 38, 39] and 
other subsequent works by different authors, is to resort to “weak” boundary condi- 
tions, in the form of Crouzeix-Raviart boundary conditions. The Dirichlet boundary 
conditions on 0K in Eq. 26 are then replaced by conditions of the type 


Ja =o or 1, 


edge 


Nedge ` AVY“ = Constant , 
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on all edges, where the local function V 'K is now associated to an edge i. For this 
approach, under technical assumptions, the error estimate is identical to that for linear 
boundary conditions, namely Eq. 28. 

More importantly, upon using such “weak” boundary conditions in the context of 
a perforated computational domain (and adding other, generic ingredients, such as 
bubble functions), the accuracy, if not improved, is now significantly more robust with 
respect to the existence of intersection between edges and perforations. A “stress-test” 
considering two extreme scenarios illustrates this property: see in [35] the detailed 
comparison of the results obtained with the MsFEM method and different boundary 
conditions for the local problems for the shifted meshes in Fig. 4. 

Let us conclude this section by emphasizing the formal link between the existence 
results for the non-periodic corrector wy that have been examined in the previous 
section and the actual local basis functions yer of the MsFEM approaches discussed 
here. Up to irrelevant technicalities and details, the corrector and the local functions 
are, intrinsically, the same mathematical object: they are obtained by zooming in 
locally and solving the problem at the scale of its heterogeneities. 


5 Homogenization Under Partial Information 


One way or another, all the approaches described so far, both at the theoretical level 
and the numerical level, rely on the full knowledge of the coefficient A,. It turns out 
that there are several practical contexts where such a knowledge is incomplete, or 
sometimes merely unavailable. From an engineering perspective (think e.g. of exper- 
iments in Mechanics), there are indeed numerous prototypical situations for Eq. 1 
where the response u^ can be measured for some loadings f, but where A, is not 
completely known, let alone the fact that it is periodic or not. In these situations, it is 
thus not possible to use homogenization theory, nor to proceed with any MsFEM-type 
approach or with the similar approaches mentioned above. Finding a pathway alter- 
nate to standard approaches is thus a practically relevant question. We are interested 
in approaches valid for the different regimes of £, which make no use of the knowl- 
edge on the coefficient A,, but only use some responses of the medium obtained 
for certain given solicitations. Questions similar in spirit have been addressed two 
decades ago by Durlofsky. The point is also to define an effective coefficient only 
using outputs of the system. They are however different in practice (see [36] for a 
detailed discussion). 

For simplicity, we restrict ourselves to cases when Eq. 1 admits (possibly up to 
some extraction) a homogenized limit Eq.5 where the homogenized matrix coeffi- 
cient A* is deterministic and constant. This restrictive assumption on the class of A* 
(and thus on the structure of the coefficient A, in Eq. 1) is useful for our theoretical 
justifications, but not mandatory for the approach to be applicable. 

For any constant matrix A, we consider generically the problem with constant 
coefficients 
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Fig. 4 Two extreme cases of meshes regarding intersections with the perforations: no inter- 
section at all (top), or as many intersections as possible (bottom). The Crouzeix-Raviart version of 
MsFEM is, roughly, equally accurate in both situations. Source [35] 
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We investigate, for any value of the parameter e, how we may define a constant sym- 
metric matrix such that the solution u(A, f) = uto Eq. 29 with matrix A best approx- 
imates the solution to Eq. 1. The best constant matrix A is (temporarily) defined as 
a minimizer of 


R € = 2 
L= m sup DA Nao. GO 
constant matrix A>0 f € L?(D), 
If llis = 1 


where we have explicitly emphasized the dependency upon the right-hand side f of 
the solutions to Eq. 1 and Eq. 29. The norm in Eq. 30 is an L? norm (and not e.g. an H! 
norm) because, for sufficiently small ¢, we wish the best constant matrix A to be close 
to A*, while u* strongly converges to u* only in the L? norm but not in the H! norm. 
The key point is that Eq. 30 is only based on the knowledge of the outputs u° (that 
could be e.g. experimentally measured), and not on that of A, itself. The theoretical 
study of the minimization problem Eq. 30 has been carried out in [36]. In particular 
it has been proven that, under classical assumptions, the matrices A with energy 
asymptotically close to the infimum 7, all converge to A* as e vanishes. In passing, 
we note that the approach provides, at least in some settings, a characterization of 
the homogenized matrix which is an alternative to the standard characterization of 
homogenization theory. To the best of our knowledge, this characterization, although 
probably known, has never been made explicit in the literature. 

In fact (and this does not alter the above theoretical results), the actual minimiza- 
tion problem we use for the practice reads as 


ge "o tT [47 (-divA V CD =P) lup, 
constant matrix A > 
f €L*(D), 
llf lla) = 1 


(31) 
where — A”! is the inverse laplacian operator supplied with homogeneous Dirichlet 
boundary conditions. The function minimized in Eq. 31 is related to the one of Eq. 30 
through the application, inside the L* norm of the latter, of the zero-order differential 
operator A^! div(A V .). Note that, in sharp contrast with Eq. 30, the function to 
minimize in Eq.31 is now, formally, a second-order polynomial in function of A. 
This property significantly speeds up the computations of the infimum. The specific 
choice Eq. 31 has been suggested to us by Albert Cohen. 

Note also that, in practice, we cannot maximize upon all right-hand sides f in 
L? (D) (with unit norm) and that we therefore replace the supremum by a maximiza- 
tion upon a finite-dimensional set of thoughtfully selected right-hand sides. 

In [36, 37], we have presented a series of numerical experiments using the above 
approach. Our tests have established that the approach is in particular able to accu- 
rately identify the homogenized matrix A* in the periodic case (with a computational 
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Fig. 5 Homogenization approach within an Arlequin-type coupling: The fine-scale highly 
oscillatory model and the coarse-grained model (tentatively identical to the homogenized model) 
co-exist in an overlap region. The three regions described in the body of our text are displayed, 
along with the fine and coarse meshes. Source [27] 


time that is much larger than the classical approach, but this is not the point). More 
importantly, itis also able to complete this task in the random case (where the classical 
approach can be prohibitively expensive). Finally, and since no particular structure 
of the coefficient A, is used, it may be applied to a large variety of non-periodic 
structures. 

A remark is in order: in both cases of periodic and random homogenization, the 
classical approach computes the homogenized coefficients by first approximating the 
corrector function. A fair comparison between the approaches can therefore only be 
achieved if the above approach also provides some approximation of the corrector 
function. It is indeed the case: the latter function can also be obtained in our approach, 
at a reduced additional computational cost, as demonstrated in [36]. 

A variant of the above approach, originally introduced in [22], is currently under 
investigation in [27]. The purpose of this variant is also to approximate A* without 
explicitly using A,, and to achieve this in a robust, engineering-type manner. In a 
nutshell, the approach consists in considering a domain divided in three regions, see 
Fig. 5. The inner region and the outer region respectively contain only the oscillatory 
model of Eq. 1 and the tentative homogenized model of Eq. 29. In between these 
two regions, an overlap region where both models exist is used for a smooth cou- 
pling. Specifically, the coupling is performed using an Arlequin-type approach (see 
again [22]) but this is not mandatory for the approach to perform. A linear Dirichlet 
boundary condition, say u = x1, is imposed on the external surface of the domain. 
It intuitively plays the role of the right-hand side function f in Eq.31. At e fixed 
presumably small, one then solves the minimization problem 


J= inf | VG - x)| 2 - (32) 


constant matrixA>0 


In the limit of e vanishing, it is established that J, also vanishes and the only minimizer 
is obtained for Ae; = A* ej, where e; = V (x1) is the first canonical vector of the 
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ambient space IR^. Repeating this procedure along each dimension of R^ allows 
to eventually identify the matrix A*. Several computational improvements of the 
original approach are introduced in [27]. A numerical analysis is also presented. 
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Zhenning Cai, Yuwei Fan, and Ruo Li 


Abstract We make a brief historical review of the moment model reduction for the 
kinetic equations, particularly Grad’s moment method for Boltzmann equation. We 
focus on the hyperbolicity of the reduced model, which is essential for the existence of 
its classical solution as a Cauchy problem. The theory of the framework we developed 
in the past years is then introduced, which preserves the hyperbolic nature of the 
kinetic equations with high universality. Some lastest progress on the comparison 
between models with/without hyperbolicity is presented to validate the hyperbolic 
moment models for rarefied gases. 


1 Historical Overview 


The moment methods are a general class of modeling methodologies for kinetic 
equations. We would like to start this paper with a historical review of this topic. 
However, due to the huge amount of references, a thorough overview would be 
lengthy and tedious. Therefore, in this section, we only restrict ourselves to the 
methods related to the hyperbolicity of moment models. Even so, our review in the 
following paragraphs does not exhaust the contributions in the history. 

According to Sir J. H. Jeans [29], the kinetic picture of a gas is “a crowd of 
molecules, each moving on its own independent path, entirely uncontrolled by forces 
from the other molecules, although its path may be abruptly altered as regards both 
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speed and direction, whenever it collides with another molecule or strikes the bound- 
ary of the containing vessel.” In order to describe the evolution of non-equilibrium 
gases using the phase-space distribution function, the Boltzmann equation was pro- 
posed [1] as a non-linear seven-dimensional partial differential equation. The inde- 
pendent variables of the distribution function include the time, the spatial coordinates, 
and the velocity. 

In most cases, the full Boltzmann equation cannot be solved even numerically. 
One has to characterize the motion of the gas by resorting to various approximation 
methods to describe the evolution of macroscopic quantities. One successful way to 
find approximate solutions is the Chapman-Enskog method [15, 18], which uses a 
power series expansion around the Maxwellian to describe slightly non-equilibrium 
gases. The method assumes that the distribution function can be approximated up to 
any precision only using equilibrium variables and their derivatives. Alternatively, 
Grad’s moment method [24] was developed in the late 1940s. In this method, by 
taking velocity moments of the Boltzmann equation, transport equations for macro- 
scopic averages are obtained. The difficulty of this method is that the governing 
equations for the components of the nth velocity moment also depend on compo- 
nents of the (n + 1)th moment. Therefore, one has to use a certain closing relation 
to get a closed system after the truncation. 

Among the models given by Grad’s method [24], Grad’s 13-moment system is the 
most basic one beyond the Navier-Stokes equations, as any Grad’s models with fewer 
moments do not include either stress tensor or heat transfer. In [23], it was commented 
that Grad’s moment method could be regarded as mathematically equivalent to the 
Chapman-Enskog method in certain cases. Thus the deduction of Grad’s 13-moment 
system can be regarded as an application of perturbation theory to the Boltzmann 
equation around the equilibrium. Therefore, it is natural to hope that the 13-moment 
system will be valid in the vicinity of equilibrium, although it was not expected to be 
valid far away from the equilibrium distribution [25]. However, due to its complex 
mathematical expression, it is even not easy to check if the system is hyperbolic, as 
pointed out in [2]. As late as in 1993, it was eventually verified in [35, 36] that the 
1D reduction of Grad’s 13-moment equations is hyperbolic around the equilibrium. 

In 1958, Grad wrote an article “Principles of the kinetic theory of gases” in 
Encyclopedia of Physics [26], where he collected his own method in the class of 
“more practical expansion techniques”. However, successful applications of the 13- 
moment system had been hardly seen within two decades after Grad’s classical paper 
in 1949, as mentioned in the comments by Cercignani [14]. One possible reason was 
found by Grad himself in [25], where it was pointed out that there may be unphysical 
sub-shocks in a shock profile for Mach number greater than a critical value. However, 
the appearance of sub-shocks cannot give any hints on the underlying reason why 
Grad’s moment method does not work for slow flows. Nevertheless, Grad’s moment 
method was still pronounced to “open a new era in gas kinetic theory” [27]. 

In our paper [5], it was found astonishingly that in the 3D case, the equilibrium is 
NOT an interior point of the hyperbolicity region of Grad’s 13-moment model. Con- 
sequently, even if the distribution function is arbitrarily close to the local equilibrium, 
the local existence of the solution of the 13-moment system cannot be guaranteed 
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as a Cauchy problem of a first-order quasi-linear partial differential system without 
analytical data. The defects of the 13-moment model due to the lack of hyperbolicity 
had never been recognized as so severe a problem. The absence of hyperbolicity 
around local equilibrium is a candidate reason to explain the overall failure of Grad’s 
moment method. 

After being discovered, the lack of hyperbolicity is well accepted as a deficiency 
of Grad’s moment method, which makes the application of the moment method 
severely restricted. “There has been persistent efforts to impose hyperbolicity on 
Grad’s moment closure by various regularizations” [39], and lots of progress has 
been made in the past decades. For example, Levermore investigated the maximum 
entropy method and showed in [33] that the moment system obtained with such a 
method possesses global hyperbolicity. Unfortunately, it is difficult to put it into 
practice due to the lack of a finite analytical expression, and the equilibrium lies on 
the boundary of the realizability domain for any moment system containing heat flux 
[30]. Based on Levermore’s 14-moment closure, an affordable 14-moment closure 
is proposed in [34] as an approximation, which extends the hyperbolicity region to a 
great extent. Let us mention that actually in [5], we also derived a 13-moment system 
with hyperbolicity around the equilibrium. 

It looks highly non-trivial to gain hyperbolicity even around the equilibrium, 
while things changed not long ago. Besides the achievement of local hyperbolicity 
around the equilibrium, the study on the globally hyperbolic moment systems with 
large numbers of moments was also very successful in the past years. In the 1D case 
with both spatial and velocity variables being scalar, a globally hyperbolic moment 
system was derived in [3] by regularization. Motivated by this work, another type 
of globally hyperbolic moment systems was then derived in [31] using a different 
strategy. The model in [3] is obtained by modifying only the last equation and the 
model in [31] revises only the last two equations in Grad’s original system. The 
characteristic fields of these models (genuine nonlinearity, linear degeneracy, and 
some properties of shocks, contact discontinuities, and rarefaction waves) can be 
fully clarified, as shows that the wave structures are formally a natural extension of 
Euler equations. 

In [4], the regularization method in [3] is extended to multi-dimensional cases. 
Here the word “multi-dimension” means that the dimensions of spatial coordinates 
and velocity are any positive integers and can be different. The complicated multi- 
dimensional models with global hyperbolicity based on a Hermite expansion of the 
distribution function up to any degree were systematically proposed in [4]. The 
wave speeds and the characteristic fields can be clarified, too. Later on, the multi- 
dimensional model for an anisotropic weight function with global hyperbolicity was 
derived in [20]. 

Achieving global hyperbolicity was definitely encouraging, while it sounded like 
a huge mystery for us how the regularization worked in the aforementioned cases. 
Particularly, the method cannot be applied to moment systems based on a spheri- 
cal harmonic expansion of distribution function such as Grad’s 13-moment system. 
As we pointed out, the hyperbolicity is essential for a moment model, while it is 
hard to obtain by a direct moment expansion of kinetic equations. To overcome 
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such a problem, we in [6] fortunately developed a systematic framework to perform 
moment model reduction that preserves global hyperbolicity. The framework works 
not only for the models based on Hermite expansions of the distribution function in 
the Boltzmann equation, but also works for any ansatz of the distribution function in 
the Boltzmann equation. Actually, the framework even works for kinetic equations 
in a fairly general form. 

The framework developed in [6] was further presented in the language of pro- 
jection operators in [19], where the underlying mechanism of how the hyperbolicity 
is preserved during the model reduction procedure was further clarified. This is the 
basic idea of our discussion in the next section. 


2 Theoretical Framework 


In this section, we briefly review the framework in [19] to construct globally hyper- 
bolic moment system from kinetic equations, as well as its variants and some further 
development. To clarify the statement, we first present the definition of the hyper- 
bolicity as follows: 


Definition 1 The first-order system of equations 


gw : ow 

—-+ Aq(w)— =0, weG 

ot 2, a(w) OXa 

is hyperbolic at wo, if for any unit vector n € R?, the matrix a n¿A¿(wy) is real 
diagonalizable; the system is called globally hyperbolic if it is hyperbolic for any 
w eG. 


Based on this definition, the analysis of the hyperbolicity of moment systems reduces 
to a problem of linear algebra: the analysis of the real diagonalizablity of the coef- 
ficient matrices. Without knowing the exact values of the matrix entries, the real 
diagonalizability of a matrix has to be studied by some sufficient conditions. Some 
of them are 


Condition 1 All its eigenvalues are real and it has n linearly independent eigenvec- 
tors. 


Condition 2 All the eigenvalues of the matrix are real and distinct. 
Condition 3 The matrix is symmetric or similar to a symmetric matrix. 


Grad [24] investigated the characteristic structure of the 1D reduction of Grad's 
13-moment system, whose hyperbolicity was further studied in [36] based on the 
Condition 2. Afterwards, this condition is adopted in the proof of the hyperbolicity 
of the regularized moment system for the 1D case in [3]. It is worth noting that 


Hyperbolic Model Reduction for Kinetic Equations 141 


using Condition 2 usually requires us to compute the characteristic polynomial of 
the coefficient matrix of the moment system, and for large moment systems, this 
may be complicated or even impractical. Even if the characteristic polynomial is 
computed, showing that the eigenvalues are real and distinct is still highly nontrivial. 
This severely restricts the use of this condition in kinetic model reduction. 

To study the hyperbolicity in multi-dimensional cases, we have applied Condi- 
tion | in [5] to show that Grad’s 13-moment system loses hyperbolicity even in an 
arbitrarily small neighborhood of the equilibrium, and in [4] to prove the global 
hyperbolicity of the regularized moment system for the multi-dimensional case. Due 
to the requirement on the eigenvectors, both proofs based on Condition 1 are compli- 
cated and tedious. By contrast, it is much easier to check Condition 3, based on which 
Levermore provided a concise and clear proof of the hyperbolicity of the maximum 
entropy moment system in [33]. In [19], we re-studied the hyperbolicity of the regu- 
larized moment system in [3, 4] based on the Condition 3 and then generalized it to 
a framework. Below we will start our discussion from a review of these hyperbolic 
moment systems. 


2.1 Review of Globally Hyperbolic Moment System 


Let us consider the Boltzmann equation: 
D 
af af 
—+ Ug—— = , 1 
m 3 tay, = 2) (1) 


and denote the local equilibrium by feq, which satisfies Q( feg) = 0 and feg > 0. 
The key idea of Grad’s moment method is to expand the distribution as 


f(t.x,v)= M falt, x, 0) falt, x)Hes(t, xv) = > falt,x)Halt, x, v) 


|a|<M la]<M 
(2) 


for a given integer M > 2, where for the multi-dimensional index a € N D la] — 
am 04, and the basis function Hg is defined by Hy = fegHea, with He, being 
the orthonormal polynomials of v with weight function feq. When feg is the local 
Maxwellian, He, can be obtained by translation and scaling of Hermite polynomials. 
Grad’s moment system can then be obtained by substituting the expansion into the 
Boltzmann equation and matching the coefficients of Ha with |o| < M. To clearly 
describe this procedure, we assume that the distribution function f is defined on 
a space H spanned by the basis functions Ha for all a € NP, and we let Hy := 
span(4, : |a| x M}bethe subspace for our model reduction. Then one can introduce 
the projection from H to Hy as 
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PRE fH. with fa = (f, Ha), (3) 


la] <M 


where the inner product is defined as (f, g) = i S8/feq dv. The projection accu- 
rately describes Grad’s expansion (2) and provides a tool to study the operators in the 
space Hy. For example, matching the coefficients of the basis Hg with |a| < M can 
be understood as projecting the system into the space Hy. Hence, Grad's moment 
system is written as 


D 
pU ay p SPO. (4) 


A OXa B 

Let H be the vector whose components are all the basis functions 71, with |a| < 
M listed in a given order. Since 7? f is a function in Hy, one can collect all the 
independent variables in P f and denote it by w with its length equal to the dimension 
of Hm. Thanks to the definition of the projection operator P, there exist the square 


matrices D and By, d = 1,..., D such that 
oP 0 oP ð 
PPLE HTD”, py t ggg 2. (5) 
ot ot OXa OXa 


Accordingly, letting Q be the vector such that PO(P f) = H" Q, one can rewrite 
Grad’s moment system as 


ow ow 
D— +) Ba —=0. (6) 
d 
Actually, the system (6) is the vector form of (4) in Hy with the basis Ha. By 


comparing these equations, we have the following correspondences 


0 0 0 ð 
D B ; 7 
w e Pf, o: wes a A Q < POPP) (7) 


Furthermore, we can diagram the procedure to derive Grad’s moment system in 
Fig. la. It is noticed in [19] that the time derivative and the spatial derivative are 
treated differently in such a process, as a projection operator is applied directly to the 
time derivative, while for the spatial derivative, this projection operator appears only 
after the velocity v is multiplied. This difference causes the loss of hyperbolicity. 
By such observation, we have drawn a key conclusion in [19] that one should add a 
projection operator right in front of the spatial derivative to regain hyperbolicity, as 
is illustrated in Fig. 1b. The corresponding moment system is 
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time derivative convection term 


multiply 
9Pf i» 
Oxa velocity 


uorjoefo1d 
uorjoefo1d 
uorjoefoxd 


(A) Grad's moment system 


time derivative convection term 


uorjoefoad 
th 
CPA 
Lo 
uoroafoid 


By, 
uorjoefo1d 


(B) Hyperbolic regularized moment system 


Fig. 1 Diagram of the procedure of Grad's and regularized moment system 


pê? Pf OP f 
ae D uP TOOR. (8) 
where the additional projection operator is labeled in red. Using (5), one can claim 
that there exist the square matrices Mz, d = 1,..., D such that 
oP ðw 
Pura Hum, (9) 
Xd OXa 
and obtain the vector form of the regularized moment system as 
pe 4 y MD =Q. (10) 
Similar to (7), we have one more correspondence: 
M; > Pu, (41) 


that is to say, the matrices My are the representation of the operators P vg on Hy. It is 
not difficult to check that the matrices M4 are symmetric due to the orthonormality of 
the basis Hg, so that any linear combination of the matrices M, is real diagonalizable. 
One can also check the matrix D is invertible. Hence D^! M;D is similar to My so 
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that the system (10) is globally hyperbolic. Moreover, if one multiplies D" on both 
sides of (10), the resulting system 


pip + yom, por =D"0 (12) 


turns out to be a symmetric hyperbolic system of balance laws. 


2.2 Hyperbolic Regularization Framework 


Till now, the hyperbolicity of (10) has been proved using the Condition 3. Looking 
back on the whole procedure, one can find that the key point of the hyperbolic 
regularization is the extra projection operator in front of the spatial differentiation 
operator in (8). Meanwhile, the underlying mechanism to obtain hyperbolicity can 
be extended to much more general cases. For example, the radiative transfer equation 
has the form 


0 f (t, x, 0, 
PLO ENP) 89, o) V, ftt, x. 0,9) QUOC. x. 0.9), 


xeR?, 0€[0,7), o €[0,2z), 


where the velocity is given by £(0, 9) = (sin 0 cos p, sin@ sin p, cos 9)”. To derive 
reduced models, one can replace the local equilibrium feg in (2) by a nonnegative 
weight function w, and correspondingly, the orthogonal polynomials He, should be 
replaced by the orthogonal basis functions $, for the L? space weighted by c, so that 
the basis functions Hg become 6, :— wy. By letting Hy := span{®, : |a| < M}, 
one can similarly define the projection operator 7? as in (3). As an extension of the 
globally hyperbolic moment system, we obtain 


P 
pit Y ue. op = = PQPP). (13) 


Again, if the corresponding matrix D as in (6) is invertible, the resulting moment 
system is globally hyperbolic. We refer the readers to [6, 19, 21] for more details of 
such applications in radiative transfer equations. 

This framework provides a concise and clear procedure to derive the hyperbolic 
moment system from a broad range of kinetic equations. It has been applied to many 
fields, including anisotropic hyperbolic moment system for Boltzmann equation [20], 
semiconductor device simulation [7], plasma simulation [11], density functional the- 
ory [8], quantum gas kinetic theory [16], and rarefied relativistic Boltzmann equa- 
tion [32]. 
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2.3 Further Progress 


The above framework provides an approach to handling the hyperbolicity of the 
moment system. However, the hyperbolicity is not the only concerned property. 
Preserving the hyperbolicity and other properties at the same time is often required 
in model reduction. Below we will list some recent attempts in this direction. 

One of the interesting properties is to recover the asymptotic limits of the kinetic 
equations. For example, the first-order asymptotic hydrodynamic limit of the Boltz- 
mann equation is the Navier-Stokes equations, and therefore it is desirable that the 
moment equations can preserve such a limit. For the classical Boltzmann equation, 
most moment systems can automatically preserve the Navier-Stokes limit if the stress 
tensor and heat flux are included. However, for the quantum Boltzmann equation, 
the equilibrium has a very special form, so that the moment system directly derived 
from the framework by taking the equilibrium as the weight function disobeys the 
Navier-Stokes limit [16]. In this case, the authors of [16] proposed a method called 
local linearization to regularize the moment system. Specifically, we assume the 
Grad-type system has the form as (6) and define M,(w) = B, (w)D(w)-!. In the 
regularization, the matrix M,(w)i is replaced by M; := Ma (Weg) with Weg being the 
local equilibrium of the state w. Such a method allows us to acquire both the hyper- 
bolicity and Navier-Stokes limit simultaneously. The symmetry of M is thereby lost 
so that one has to use Condition 1 to prove the hyperbolicity. 

Another relevant work is the nonlinear moment system for radiative transfer equa- 
tion in [21, 22]. In order to retain the diffusion limit (similar to the Navier-Stokes limit 
for the Boltzmann equation), the authors pointed out that the projection operators in 
(13) at different places do not have to be same and revised (13) to be 


pê? Py Pf 


+ Y ue. opr - = POPS). (14) 


The operators P and P are orthogonal projections onto different subspaces of H. By a 
careful choice of the subspace for the operator P, the diffusion limit can be achieved, 
and meanwhile, the symmetry of M corresponding to that in (10) is preserved, leading 
again to global hyperbolicity. This generalization has broadened the application the 
hyperbolic regularization framework and also permits us to take more properties of 
the kinetic equations into account. 

Besides the hyperbolicity for the convection term, one may also be interested in 
the wellposedness of the complete moment system including the collision term. One 
related property is Yong’s first stability condition [38], which includes the constraints 
on the convection term, collision term, and the coupling of both. This stability con- 
dition is shown to be critical for the existence of the solutions in [37]. In [17], the 
authors have studied multiple Grad-type moment systems and confirmed that all of 
these systems satisfy Yong’s first stability condition. 

Under this concise and flexible framework, one may wonder what is sacrificed for 
the hyperbolicity. By writing out the equations, one can immediately observe that the 
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form of balance law is ruined by the hyperbolic regularization. A natural question is: 
how to define the discontinuity in the solution? More generally, one may ask: what 
is the effect of such a regularization on the accuracy of the model? In the following 
section, we will provide some clues using numerical experiments. 


3 Numerical Validation 


The application of the framework in the gas kinetic theory has been investigated in 
a number of works [3, 9, 10, 12], where many one- and two-dimensional examples 
have been numerically studied to show the validity of hyperbolic moment equations. 
However, these globally hyperbolic models, as an improvement of Grad’s original 
models, have never been compared with Grad’s models in terms of the modeling 
accuracy. The only direct comparison seen in the literature is in [10], wherein for 
a shock tube problem with a density ratio of 7.0, the simulation of Grad’s moment 
equations breaks down and the corresponding hyperbolic moment equations appear 
to be stable. Without running numerical tests for the same problem for which both 
models work and comparing the results, it could be questioned whether we lose 
accuracy when fixing the hyperbolicity. Such doubt may arise since the globally 
hyperbolic models can be considered as a partial linearization of Grad’s models 
about the local Maxwellians. 

In this section, we will make such straightforward comparison using the same 
numerical examples for both methods. For simplicity, we only consider the one- 
dimensional physics, for which both x and v are scalars. In this case, the characteristic 
polynomial for the Jacobian of the flux function has an explicit formula [3], so that 
the hyperbolicity of Grad’s equation can be easily checked. The underlying kinetic 
equation used in our test is the Boltzmann-BGK equation with a constant relaxation 


time 
af of 


1 
ot i EF E Rn Lea f. d3) 


The ansatz of the distribution function is given by (3), so that (4) stands for Grad's 
moment system, and (8) stands for the hyperbolic moment system. Below we are 
going to use two benchmark tests to show the performance of both types of models. 
In general, both Grad's moment equations and the hyperbolic moment equations are 
solved by the first-order finite volume method with local Lax-Friedrichs numerical 
flux. Time splitting is applied to solve the advection part and the collision part sep- 
arately, and for each part, the forward Euler method is applied. The CFL condition 
is utilized to determine the time step, and the Courant number is chosen as 0.9. For 
Grad's moment method, the maximum characteristic speed is obtained by solving 
the roots of the characteristic polynomial of the Jacobian, and the explicit expression 
of the charateristic polynomial has been given in [3]. For the hyperbolic moment 
method, the maximum characteristic speeds have been computed in [3]. The explicit 
form of the hyperbolic moment system (given in [3]) shows that its last equation con- 
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tains a non-conservative product, which is discretized by central difference. In all the 
numerical examples, the number of grid cells is 1000 if not otherwise specified. We 
have done the convergence test showing that for smooth solutions, such a resolution 
can provide solutions sufficiently close to the solutions on a much finer grid, so that 
their difference is invisible to the naked eye. When exhibiting the numerical results, 
we will mainly focus on the equilibrium variables including density p, velocity u, 
and temperature 0, which are defined by 


p(t, x) = f ren. sya 
R 


u(t, x) = aay | enm. 


(t,x) = 


a x) E — u(t, x)P f (t, x, v) dv. 


3.1 Shock Structure 


The structure of plane shock waves is frequently used as a benchmark test in the gas 
kinetic theory. It shows that the physical shock, which appears to be a discontinuity 
in the Euler equations, is actually a smooth transition from one state to another. The 
computational domain is (—0o, +00) so that no boundary condition is involved, and 


the initial data are 
2 
v —— 
B exp ( ( 2). ite <0 


A/ 2100 20 
fO, x, v) = i A (16) 
Pr (v E u,) z 
exp ,ifx > 0, 
2110, 20, 


where all the equilibrium variables are determined by the Mach number Ma: 


pe=1, u=vV3Ma, 0=1, 


2Ma? 
Di T 
Ma +1 
/3Ma 
uy = , 
Dr 


| .3Mdà^ - 1 
UT Me. 


r 
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Fig. 2 Left: The comparison of shock structures of two solutions with Mach number 1.4 and 
M = 4. Right: The green area is the hyperbolicity region (horizontal axis: fy—1, vertical axis: fm), 
and the red loop is the parametric curve (fm-1, fm) with parameter x 


We are interested in the steady-state of this problem. Since the parameter Kn only 
introduces a uniform spatial scaling, it does not affect the shock structure. There- 
fore we simply set it to be 1. Numerically, we set the computational domain to be 
[—30, 30]. The boundary condition is provided by the ghost-cell method, and the 
distribution functions on the ghost cells are set to be the two states defined in (16). 


3.1.1 Case 1: Ma = 1.4 and M = 4 


In this case, both Grad’s system and the hyperbolic moment system work due to 
the relatively small Mach number. The numerical results are shown in Fig.2. By 
convention, we plot the normalized density, velocity, and temperature defined by 


poo LOA guy PDA GG 90078 
Pr — Pl uj — Uy 6, — 0, 


so that the value of all variables are generally within the range [0, 1], unless the 
temperature overshoot is observed. 

Figure2b shows the hyperbolicity region of Grad's moment equations. It has been 
proven in [3] that for the one-dimensional physics, the hyperbolicity region can be 
characterized by the following two dimensionless quantities: 


F fm 5 fu 
fui = goma M= Mn’ 


where fm and fm-ı are the last two coefficients in the expansion (3). The red curve 
in Fig. 2b provides the trajectory of Grad’s solution in this diagram. It can be seen that 
for such a small Mach number, the whole solution is well inside the hyperbolicity 
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Fig. 3 Left: The comparison of shock structures of two solutions with Mach number 2.0 and 
M = 4. Right: The green area is the hyperbolicity region (horizontal axis: fy—1, vertical axis: fm), 
and the red loop is the parametric curve (fm-1, fm) with parameter x 


region, so that the simulation of Grad’s moment equations is stable. Figure 2a shows 
that both methods provide smooth shock structures, and the predictions for all the 
equilibrium variables are similar. This example confirms the applicability of both 
systems in weakly non-equilibrium regimes. Note that for one-dimensional physics, 
Grad’s equations do not suffer form the loss of hyperbolicity near equilibrium. 


3.1.2 Case 2: Ma = 2.0 and M = 4 


Now we increase the Mach number to introduce stronger non-equilibrium. The same 
plots are provided in Fig.3. In this example, despite the numerical diffusion, dis- 
continuities can be identified without difficulty from the numerical solutions. These 
discontinuities, also known as subshocks, appear due to the insufficient characteris- 
tic speed in front of the shock wave, meaning that both systems are insufficient to 
describe the physics. To capture these discontinuities, 8000 grid cells are used in the 
spatial discretization. This example shows significantly different shock structures 
predicted by both methods. For Grad’s moment equations, the subshock locates near 
x = —7, while for hyperbolic moment equations, the subshock appears near x = —5. 
The wave structures also differ a lot. By focusing on the high-density region, we find 
that the solution of hyperbolic moment equations is smoother, showing the possibly 
better description of the physics. 

Here we remind the readers that the wave structure of hyperbolic moment equa- 
tions may depend on the numerical method, due to its non-conservative nature. The 
locations and the strengths of the subshock may change when using the different 
shock conditions. However, we would like to argue that it is meaningless to justify 
any solution with subshocks for the hyperbolic moment equations, for it is unphysi- 
cal and should not appear in the solution of the Boltzmann equation. In practice, the 
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appearance of discontinuous solutions is an indication of the inadequate truncation 
of series, which inspires us to increase M to get more reliable solutions without 
subshocks. 

Figure 3b shows that Grad’s solution still locates within the hyperbolicity region, 
although the curve is already quite close to the boundary of the region. This example 
shows that even in its hyperbolicity region, Grad’s moment method may lose its 
validity. 


3.1.3 Case 3: Ma = 2.0 and M = 6 


Now we try to increase M and carry out the simulation again for Mach number 
2.0. The results are given in Fig.4. With the hope that a larger M can provide a 
better solution, we actually see that Grad’s moment equations lead to computational 
failure. The numerical solution before the computation breaks down is plotted in 
Fig. 4a. Figure 4b clearly shows that this is caused by the loss of hyperbolicity. We 
believe that this implies the non-existence of the solution. 

On the contrary, the simulation of hyperbolic moment equations is still stable. As 
expected, it provides a smooth shock structure and improves the result predicted by 
M =4. 


3.1.4 Case 4: Ma = 1.7 and M = 6 


In this example, we decrease the Mach number so that the shock structure of Grad’s 
equations can be found. Figure 5a shows that the results of both systems generally 
agree with each other, but it can be observed that hyperbolic moment equations 
provide smoother solutions than Grad’s system, so that it is likely to be more accurate. 
Therefore, despite the higher nonlinearity of Grad’s system, it does not necessarily 
help provide better solutions. 

Interestingly, when looking at the phase diagram plotted in Fig.5b, we see that 
Grad’s solution has run out of the hyperbolicity region. It is to be further studied why 
the solution is still stable. Here we would like to conjecture that the collision term 
and the numerical diffusion help stabilize the numerical solution in the evolutionary 
process, and for the steady-state equations, solutions for non-hyperbolic equations 
may still exist. Nevertheless, all the above numerical tests show the superiority of 
hyperbolic moment equations for both accuracy and stability. 


3.1.5 Case 5: Ma = 2.0 and M = 10 


In this example, we would like to show the failure of both systems for a larger 
M. In Fig. 6, we plot the results at  — 0.8, where both numerical solutions contain 
negative temperatures. In [28], the reason for such a phenomenon has been explained, 
which lies in the divergence of the approximation (3) as M tends to infinity. It is 
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Fig. 5 Left: The comparison of shock structures of two solutions with Mach number 1.7 and 
M = 6. Right: The green area is the hyperbolicity region (horizontal axis: fy—1, vertical axis: fm), 
and the red loop is the parametric curve (fm-1, fm) with parameter x 


Fig. 6 The numerical 12 
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number 2.0 and M = 10 


Grad, p 
Grad, ü 
Grad, à J 
— — — Hyperbolic, p 

— — — Hyperbolic, à 
— — — Hyperbolic, 8 |] 


02 - V | 
04 F 1 
-0.6 + | 
0.8 : : : 

6 4 -2 0 2 4 6 


rigorously shown in [13] that when 6, > 26), for the solution of the steady-state BGK 
equation, the limit of P f (see (3)) as M — oco does not exist. Here for Ma = 2.0, 
the temperature behind the shock wave is 6, = 55/16 > 2 = 26;. Thus for a large 
M, the divergence leads to a poor approximation of the distribution function, and 
it is reflected as a negative temperature in the numerical results. Such a divergence 
issue is independent of the subshock and the hyperbolicity, and should be regarded 
as a defect for both systems. The work on fixing the issue is ongoing. 
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3.2 Fourier Flow 


In this test, we are interested in the performance of both methods with wall boundary 
conditions. The fluid we are concerned about is between two fully diffusive walls 
locating at x = —1/2 and x = 1/2. For the Boltzmann-BGK equation (15), the 
boundary condition is 


2 
fe -1/20 = Lol |: v > 6, 


2 
Pr D 
t, 1/2, v) = 3 0, 
ft, 1/2, v) A8. exp ( x) v< 


where 6, , stands for the temperature of the walls, and pj, is chosen such that 


| vf (t, £1/2, v) dv = 0. 


R 


Following [24], the boundary conditions of moment equations can be derived by tak- 
ing odd moments of the diffusive boundary condition. We choose the initial condition 
as 


fO,x . (5) (17) 
AE oe p 


for all x. Again we are concerned only about the steady-state of the solution. 

In our numerical experiments, we choose Kn = 0.3, 6; = 1 and M = 11. Two 
test cases with 6, = 1.9 and 6, = 2.7 are considered. For the smaller temperature 
ratio 0, = 1.9, the numerical results are given in Fig. 7, where two solutions mostly 
agree with each other. The reference solution, computed using the discrete velocity 
model, is also provided in Fig. 7a. It can be seen that both models provide reasonable 
approximations to the reference solution. The good behavior of Grad’s solutions can 
also be predicted by the phase diagram in Fig. 7b, from which one can observe that 
the whole solution locates in the central area of the hyperbolicity region. 

For 0, = 2.7, the results are plotted in Fig. 8. In this case, if we start the simulation 
of Grad's equations from the initial data (17), the computation will break down due 
to the loss of hyperbolicity in the evolutional process. Therefore, we first run the 
simulation for hyperbolic moment equations from the initial data (17) and evolve the 
solution to the steady-state. Afterward, this steady-state solution serves as the initial 
data of Grad's equations. Although the steady-state solution of Grad's equations 
can be found using this technique, the approximation looks poorer than hyperbolic 
moment equations. The phase diagram (Fig. 8b) shows that the solution near the left 
wall is outside the hyperbolicity region, so that the validity of boundary conditions 
on the left wall becomes unclear. In contrast, the hyperbolic moment equations still 
provide reliable approximation despite the high temperature ratio. 
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Fig.7 Left: Steady Fourier flow for 6, = 1.9 (left vertical axis: p, right vertical axis: 0). Right: 
The green area is the hyperbolicity region (horizontal axis: fy—1, vertical axis: fm), and the red 
line is the parametric curve (fm—1, fm) with parameter x 
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Fig. 8 Left: Steady Fourier flow for 6, = 2.7 (left vertical axis: p, right vertical axis: 0). Right: 
The green area is the hyperbolicity region (horizontal axis: fm—1, vertical axis: fm), and the red 
line is the parametric curve (fy—1, fm) with parameter x 


3.3 A Summary of Numerical Experiments 


In all the above numerical experiments, we see that despite the loss of some nonlin- 
earity, the hyperbolicity fix does not appear to lose accuracy in any of the numerical 
tests. In regimes with moderate non-equilibrium effects, Grad’s equations may pro- 
vide solutions outside the hyperbolicity region without numerical instability. In this 
situation, our experiments show that the hyperbolicity fix is likely to improve the 
accuracy of the model. It has also been demonstrated that other issues, such as sub- 
shocks and divergence, are not related to the hyperbolicity, and these issues have to 
be addressed independently. 


Hyperbolic Model Reduction for Kinetic Equations 155 


4 Conclusion 


The loss of hyperbolicity, as one of the major obstacles for the model reduction in 
gas kinetic theory, is almost cleared through the research works in recent years. With 
a handy framework introduced in Sect.2, we can safely move our focus of model 
reduction to other properties such as the asymptotic limit, the stability, and the con- 
vergence issues. Our numerical experiments show that the hyperbolic regularization 
does not harm the accuracy of the model. It is our hope that such a framework can 
inspire more thoughts in the development of dimensionality reduction even beyond 
the kinetic theory. 
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Transformation t 
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Kazue Sako 


Abstract Cryptography is implemented using discrete mathematics with security 
defined in complexity theory. In this article, we review some cryptographic primitives 
for encryption, signing messages and interactive proofs. By combining cryptographic 
primitives, we can design and digitally implement various services with desired 
features in security, privacy and fairness. We will discuss some examples such as 
electronic voting and cryptocurrencies. 


1 Digital Transformation 


Research in mathematics and cryptography play a big role in shaping our digitalized 
society much better in coming years. There is an immense expectation that technology 
on Information and Communications, known as ICT, would transform our life to be 
more efficient, more productive and more functional. However, these are bright side 
of digital transformation. We also need to take care to transform 'correctly' so that 
we do not suffer from unexpected consequences. 

One evident characteristic of ICT is that it makes us free from physical con- 
straints. Digital data have little weight and thus we can make thousand copies and 
travel thousand miles at once. While this characteristic brings benefit, it also brings 
threats to our life. We need alternative ways to create ‘constraints’ to those who is 
willing to harm us, and one promising approach to creating such constraints is use 
of cryptography. 

Cryptography started as a way to conceal information. We were able to design 
cryptographic algorithm that is computationally infeasible to recover the message 
without knowledge of a decryption key. There are rigorous mathematical proofs that 
guarantee that indeed this characteristic holds based on some hard problems, like 
NP problems or factorization. So this computational difficulty would serve as an 
alternative constraints in a digital world. 


K. Sako (BX) 
Waseda University, Tokyo, Japan 
e-mail: kazuesako O aoni.waseda.jp 


O The Author(s) 2022 159 
T. Chacón Rebollo et al. (eds.), Recent Advances in Industrial and Applied 

Mathematics, ICIAM 2019 SEMA SIMAI Springer Series 1, 
https://doi.org/10.1007/978-3-030-86236-7_9 


160 K. Sako 


In this article, we provide two examples of use cryptography to implement secure 
digital systems. One is digitalization of voting system, and the other is digitalization 
of payment system called Bitcoin. Prior to these two examples we oversee some 
cryptographic primitives such as encryption schemes, digital signature schemes and 
interactive proofs. 


2 Cryptographic Foundations 


In this section, we will introduce three fundamental notions in cryptography. They 
are Encryption Schemes, Digital Signature Schemes and Interactive Proofs. 


2.1 Encryption Schemes 


First, we begin by introducing two types of encryption schemes, depending on how 
we use keys. The first type, which is called Symmetric-key encryption schemes, uses 
the same key for both encryption and decryption. This type of encryption schemes 
existed since the age of Gaius Julius Caesar. The new type of encryption is called 
Publickey encryption schemes or Asymmetric-key encryption schemes, where we 
use different keys for encryption and decryption. Moreover, the key to encrypt data 
can be made public (Fig. 1). 

Let us briefly discuss some mathematical model to define encryption schemes 
and its security. Encryption schemes, either symmetric or asymmetric, can be mod- 


Cryptographic Foundations I 


| Symmetric-key encryption | Public-key (Asymmetric-key) 
encryption 
Same key Public-key of My secret key 


the receiver 


CO 
—s 
send 


Fig. 1 Two types of encryption schemes 
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eled in three non-deterministic functions, namely KeyGeneration, Encryption and 
Decryption, with a security parameter k. KeyGeneration, on input k, outputs a key 
pair EncKey and DecKey. (In case of Symmetric Key encryption schemes, EncKey 
= DecKey holds.) Encryption Function, given a message m from its domain and 
EncKey, outputs a ciphertext c. 


c = Encryption(k, m, EncKey) 


Similarly, Decryption Function, given a ciphertext c from its domain and DecKey, 
outputs a message m’. 


m' = Decryption(k, c, DecKey) 


A triplet of nondeterministic functions (KeyGeneration, Encryption, Decryption) 
is called Encryption scheme if and only if: For any k, for any output (EncKey, 
DecKey) of KeyGeneration on input k, and for any message in m, 


m = Decryption(k, Encryption(k, m, EncKey), DecKey) 


holds. 

As seen in the definition, even an Encryption function that returns m as c is an 
Encryption Scheme. So we need to define what property we need to call an Encryption 
Scheme secure. Cryptographers had studied various ways to do this. A fundamental 
Observation is: given any two messages mı and m», and given any ciphertext c; of 
either mı or m», the encryption scheme is secure if no one can guess to which message 
a ciphertext c decrypts to with probability more than half. To be more rigorous, we 
need to define this in an asymptotic manner. That is, if we chose large enough k, the 
probability of guessing can be made larger than 1/2 + e. We note that in Asymmetric 
Encryption Schemes, guessing is hard even if they know EncKey that was used to 
create c. There are various other security definitions for Encryption Schemes, be it 
strong or weak [1]. 

To prove security of some concrete Encryption Schemes, we assume existence of 
some one-way functions or some difficult problems like factorization. 


2.2 Digital Signature Schemes 


Another exciting tools related to Public Key Encryption Schemes are Digital Sig- 
nature Schemes. If we can have two related keys PubKey and PrivKey, where one 
can publish PubKey without worrying about secrecy of PrivKey, we can construct 
a scheme that serves as Digital Signatures. A person would sign a message with 
PrivKey and outputs a signature sig. Anyone can verify whether or not the signa- 
ture was generated using a key corresponding to PubKey, by performing Verification 
(Fig. 2). 
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| Public-key encryption | Digital Signature 


Public-key of 
the receiver 


My secret key 


Public-key 
of 


the signer 


Fig. 2 Digital signature schemes 


Similarly, Digital Signature Scheme is modeled by three nondeterministic func- 
tions (KeyGen, Gen-SIG, Verify). KeyGen, on input security parameter k, outputs 
a key pair PrivKey and PubKey. Gen-SIG Function, given a message m from its 
domain and PrivKey, outputs a signature sig. 


sig = Gen-SIG(k, m, PrivKey) 


Verify Function, given a signature sig from its domain, the message m and PubKey, 
outputs either OK or NG. 


OK/NG = Verify(k, sig, m, PubKey) 


A triplet of nondeterministic functions (KeyGen, Gen-SIG, Verify) is called Signature 
scheme if and only if: For any k, for any output (PrivKey, PubKey) of KeyGeneration 
on input k, and for any message in m, 


OK = Verify(k, Gen-SIG(k, m, PrivKey), m, PubKey) 


holds. 

For security of signature schemes, we want to claim that it is only a person 
who knows PrivKey can generate sig corresponding to m that the Verify Function 
outputs OK. For this purpose, we claim a Signature Scheme is secure if there is an 
algorithm that can generate signatures that Verify outputs OK, then we can use the 
algorithm to ‘extract’ PrivKey. For sake of space, please refer to reference [1] for 
more mathematical definition for security of digital signature schemes. 
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Cryptographyic Foundations III: Interactive proofs 


| Ordinary written-down proofs | Interactive proofs 
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Fig. 3 Interactive proofs 


2.3 Interactive Proofs 


The last primitive we will discuss in the article is Interactive Proofs. In Mathematics, 
when we say Proof, it is usually something that can be written down in the paper and 
those who have seen the Proof can verify the correctness of its claim. So the script 
of Proof is non-interactive. The Prover alone would generate the script of Proof by 
himself. Also the script of Proof is transferable, that any party who have seen the 
Proof can verify that the claim is correct. 

Instead, there are protocols where Prover and Verifier talks interactively and at the 
end Verifier is persuaded that the Claim is correct. This is called Interactive Proofs 
(Fig.3). This type of interactive proofs can provide further characteristic that the 
Verifier learn nothing from the interaction except that the Claim is correct. That is, 
Verifier learned no knowledge or zero knowledge in engaging the proof protocol. 
These types of protocols are called Zero Knowledge Interactive Proofs, which are 
frequently used in cryptographic protocols. Because the Verifier learned no new 
knowledge, he cannot prove to a third party that the Claim Prover proved is correct. 


3 Digitalizing Voting 


In this section we discuss how voting procedure can be securely digitalized using 
cryptography. Typically the process of designing cryptographic protocols consists of 
clarifying the purpose and modeling its feature, then design the protocol, and verify 
the designed protocol meets the previously set goal. 
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Electronic Voting 


e Redefine/clarify its purpose 
e Model its features 


4 Yes and 2 No 


NY 


Tallying authority 


Fig. 4 Model of electronic voting 


3.1 Requirements for Voting 


So let us clarify the purpose of the voting and its desired property. Here, we assume 
there is a list of legitimate voters with their respective public keys and a Tallying 
authority. Each legitimate voter cast either yes or no vote and the Tally authority 
wants to have a correct counting of the votes (Fig. 4). The three main requirements 
we need to meet are the following: 


1. Only legitimate voters vote, and one vote per voter. 
2. Tallying authority cannot announce faulty results. 
3. No one can learn how each voter voted. 


3.2 Designing Voting Protocol 


It seems these three requirements are hard to achieve simultaneously. If we let all 
legitimate voters sign their vote, then the first requirement can be met. However, 
if the votes are signed with the voter’s key, it means the votes are not anonymous 
thus conflicts the third requirement. If we make all votes anonymous, then we cannot 
verify if the votes are from legitimate voters or even if they are, they could have voted 
more than once. Moreover, we cannot verify if the Tallying Authority just neglected 
some of the anonymous votes cast in counting the tally. 

There are several ideas to meet all three requirements that seems conflicting. In 
this subsection, we will discuss one of such ideas using shuffling [2]. 
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Digishuff: Shuffling based voting protocol 


Ballot in 
double 
envelopes 


DEC) CDEC) C DEC > Prove in 
Zero-Knowledge 
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The process is correct 
Fig. 5 Overview of voting protocol using shuffling 


The underlying idea came from how we meet those requirements using paper 
ballots in voting. In one providence, a voter fills in his paper ballot and put in a 
blank envelope. Then the voter puts this bank envelope in a larger envelope and signs 
with the voter's name. The voter hands this envelope to the Tallying Authority. The 
Tallying Authority can verify that the voter is a legitimate voter and has hand in one 
envelope, but because they are in an envelope the Authority cannot learn the vote. 
How about counting? On the day of counting the votes, all the outer envelopes are 
removed, but still in a blank inner envelope. All blank envelopes are thrown on the 
table and the envelopes will be shuffled manually so that no one learns which inner 
envelope came from which outer envelope. After adequate shuffling are performed, 
inner envelopes will be opened and count the ballots within. All the procedure will 
be supervised by an observer so that Tallying Authority cannot cheat while shuffling 
or opening the envelopes. So this trick may be able to use in digitalization (Fig. 5). 

So we will encrypt the ballot using a public key of the system to mimic the blank 
inner envelope. As an outer envelope, the voters would sign on the encrypted ballot, 
and cast to the Tallying Authority. The Authority learns from the signature on the 
encrypted ballot that the ballot is from a legitimate voter and the same voter had not 
voted more than once, but the ballot itself cannot be seen as it is encrypted. Then 
the Authority removes the digital signature part and ‘shuffles’ the encrypted ballots. 
After the encrypted ballots has been well mixed, that is, it has been made difficult 
to match who submitted the encrypted ballot, the ballots will be decrypted to enable 
tallying. This way, we can ensure that we have only counted legitimate voter's vote 
once, and authority would not learn the vote of each voter as long as decrypting 
keys are kept safe. To ensure that the Authority performed correct Tallying, the 
Authority would provide Zero Knowledge Interactive Proofs to prove that it has 
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How to shuffle digital data? 


Input After Shuffle??? 
Alice KE9SLIWEL SJAJIWES4S 
Bob SJAJAVES4S —-+ QKS769WML 
Chris GKXBRPB9U => GR83FSOBUY 
Eva QKS769WML ~~ GKX3RPB9U 
Dave GR83P80BUY — KE9SLIWEL > 


Easy to 
trace back 


Fig. 6 Permutation is not shuffling 


followed the procedure correctly and that the result of the tally is trustworthy. In the 
next subsection, we discuss in more detail how we ‘shuffle’ digital data. 


3.3 Shuffling Encrypted Data Using Probabilistic Encryption 


If ‘shuffling digital data’ was simply changing the location of some digital data, 
then even after shuffling it is easy to spot which digital data came from whom, by 
matching the bit patterns (Fig. 6). 

So in digital shuffling, we need to change a look of digital data. For this purpose, 
we are going to use a public key encryption scheme that is probabilistic [3]. That 
is, the encryption function is non-deterministic, therefore there are many ciphertexts 
that decrypt to a same message. So changing ‘the look’ of encrypted digital data is 
to replace the encrypted data with another encrypted data that decrypts to the same 
message. Figure 7 illustrates such shuffling procedure. First a list of encrypted ballots 
are permutated. Then each encrypted ballot is replaced with another encrypted data 
without changing the content of the ballot. Looking at the input list and the output 
list, it is difficult to trace which ballot was shuffled to which position. 

An example of a probabilistic encryption scheme that offer this characteristic is 
called ElGamal Encryption [4]. Here we provide an overview of the scheme. ElGamal 
Encryption is based on the assumption that given a prime p, an generator g of Zp 
and y = g^ mod p, it is difficult to compute a from (p, g, y) for randomly chosen 
y in Zp. This is called Discrete Logarithm Problem. So KeyGeneration function for 
ElGamal Encryption is generating p of length k (security parameter) g, and y for 
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Re-encryption 


Input Permute Output 
Alice KE9SLIWEL SJAJIWES4S IWOJDLS76 
Bob SJAJIWE54S QKS769WML RDQM4LX 
Chris GKX3RPB9U —  GR$83F80BUY —» F8ZPFIEG 
Eva QKS769WML | —: GKX3RPB9U —» JV7D34S 
Dave GRS3FS0BUY cKE9SLIWEL > C PQjODANXH: 

7 RE9SLIWEL "4 
Change the i PQJODANXH ; 
. i IESXJEN39 —.... / 
look of encryption V PS49XKISN “ 


Fig. 7 Shuffling procedure 


randomly chosen a. Public Key will be (p, g, y) and the exponent a will serve as 
secret key. Encryption function, on input message m in Zp and Public Key (p, g, y), 
generates a random number r, and outputs 


(c1, c2) = (g” mod p, m * y” mod p) 


as aciphertext of m. On input (c1, c2) and secret key a, Decryption function performs 
c2/(cl)" mod p which should be equal to the message m if the ciphertext was 
correctly conveyed. In order to change the look of (c1, c2), 


(di, d2) = (c * g mod p, c» * y mod p) 


for a randomly chosen s, would provide another different looking ciphertext that 
also decrypts to the message m. It is interesting to see that this transformation can 
be performed without the knowledge of the secret key. 


4 Bitcoin Blockchain 


Perhaps one of the most impressive digital transformation through cryptography was 
digitalizing *money' called Bitcoin [5]. There are many prepaid electronic money 
systems today like PayPay, but itis restricted to one currency and there is an account- 
able organization who is operating the system. Satoshi Nakamoto designed a system 
where only the algorithms ensure the correctness of the money transfer and excluding 
the existence of a centralized authority. We provide below an overview of his design. 
We note some details are omitted for the sake of simplicity. 
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Blockchain | Data Propagated among Multiple Nodes 


Signed Transaction data is given to ledger layer 
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Valid signed data is propagated in Peer-to-Peer communication 


Fig. 8 Data managers and transaction logs 


4.1 Modeling Blockchain 


Blockchain is a technology that is used to manage transaction data in Bitcoin. There 
are users of Bitcoin who issue transaction data, typically saying “sending x Bitcoin 
from my account yyy to the address zzz.’ The transaction is accepted if the message 
is indeed sent from the owner of the account yyy and indeed there are x Bitcoin 
left in the account. The log of transaction infers that after the transaction has been 
accepted, x Bitcoin should be decreased from the account yyy and added to the 
account zzz. Unlike previous systems where there is one organization keeping record 
of all the transactions, there are multiple voluntary “Data managers” in Bitcoin known 
as Full Node, connected in Peer-to-peer fashion. When a user issued a transaction, 
Data managers check its correctness and propagates the transaction to other Data 
managers. The ideal goal is that all the Data managers keep these transaction log in a 
consistent way (Fig. 8). However, as transaction logs are created by various account 
holders internet-wide and that communication through Peer-to-peer network may not 
always be perfect, there is no guarantee that the list of logs are consistent among all the 
Data managers. So the big problem Satoshi had to solve was how to synchronize the 
transaction log among the Data managers while they are connected in asynchronous 
Peer-to-peer network. 
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Blockchain | Generating a ‘Block’=Crypto puzzle 
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Fig. 9 Crypto puzzles for synchronization 


4.2 Crypto Puzzle for Synchronization 


A core idea behind synchronization is to restrict frequent distribution of transactions. 
If the distribution happens infrequently, for example once in every 10 min or so, that 
should provide enough time within Peer-to-peer network to share the same data. In 
order to achieve this, Bitcoin blockchain is designed so that a bulk of transaction log 
are bundled in a block, and the block cannot be distributed among Data Managers 
unless accompanied by a certain solved crypto puzzle related to the content of that 
block. This crypto puzzle is so designed that the puzzle for any block can be solved 
with high probability, but is time consuming. We note that while the puzzle is hard to 
solve, it is easy for other Data managers to verify that the solution is correct (Fig. 9). 

In order to define crypto puzzle, we use a mathematical function called Hash 
Function. Hash Function deterministically maps an arbitrarily long input string to 
a fixed length integer of say 256 bits. The output is called a hashed value. With 
cryptographically secure hash function, it is computationally difficult to find two 
different input that maps to a same hashed value. There are known algorithms that 
is believed to achieve this property, such as SHA-256 [6]. 

Let us assume a Data Manager wants to add bulk of data D,,..., Dn, on top of 
the latest Block data Bn. The Puzzle is defined to find an string str that satisfies the 
following equation. 


Hash(Hash(Bn) ||Dil ...||D,l| str) < 29^ 
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where || represents concatenation of strings and Bn(k) is an integer defined from the 
previous block Bn, which is called difficulty. A typical output of Hash function is 
an integer of length 256, so if Bn(k) is about 60, one need to try many possible str 
to check if it meets the equation. The difficulty is so designed that this try and error 
process would take 10 min on average to find the desired string str. 

The list of Data D;,..., Dn, accompanied by the correct puzzle solution str, is 
the propagated as a new block within Data Managers. Other Data Managers who 
received the block verifies the correctness of the solution. If correct, they add this 
block on top of the previous blocks, as the chain of data store. Then they will try to 
solve the next puzzle based on the new block with other transaction log that has not 
yet been stored in the blockchain. 


4.3 Incentives for Data Managers 


We conclude the overview of Bitcon Blockchain by mentioning why the Data man- 
agers spend their computational effort to solve meaningless puzzle. The Data man- 
agers are awarded by Bitcoin if they solved the puzzle and followed by the future 
Blocks. Their incentives for receiving the award play a central role in maintaining 
consistent data among Data managers, and distract them from behaving maliciously. 


5 Concluding Remarks 


In this article we have discussed some of the examples of securely implementing 
current social activities in cyber world using cryptography. We have shown some 
of the cryptographic primitives are defined mathematically. The procedure to design 
secure protocols begin with clarifying the goal and requirements and then design to 
meet those criteria. Although these examples show that cryptography is a promising 
approach, we still lack in technology to model and evaluate mathematically overall 
system for digital transformation. The author sincerely hope that this article would 
encourage the researchers in mathematics, cryptography and information technology 
to get together and share their strengths for the goal of making our digital society 
more secure and fair place. 


References 


1. Goldreich, O.: Foundation of Cryptography. Cambridge University Press (2009) 

2. Furukawa J, Mori K, Sako K (2010) An implementation of a mix-net based network voting 
scheme and its use in a private organization. In: Towards Trustworthy Elections, pp. 141-154 
(2010) 

3. Goldwasser, S., Micali, S.: Probabilistic encryption. J. Comput. Syst. Sci. 28(2), 270-299 
(1984) 


Cryptography and Digital Transformation 171 


4. El Gamal, Taher: A public key cryptosystem and a signature scheme based on discrete loga- 
rithms. IEEE Trans. Inform. Theory 31(4), 469-472 (1985) 

5. Nakamoto, S.: Bitcoin: A Peer-to-Peer Electronic Cash System (2009). https://bitcoin.org/ 
bitcoin.pdf 

6. NIST FIPS 180-4: Secure Hash Standard (SHS) 


Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 
International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, 
adaptation, distribution and reproduction in any medium or format, as long as you give appropriate 
credit to the original author(s) and the source, provide a link to the Creative Commons license and 
indicate if changes were made. 

The images or other third party material in this chapter are included in the chapter’s Creative 
Commons license, unless indicated otherwise in a credit line to the material. If material is not 
included in the chapter’s Creative Commons license and your intended use is not permitted by 
statutory regulation or exceeds the permitted use, you will need to obtain permission directly from 
the copyright holder. 


Efficient Algorithms for Tracking Moving f) 
Interfaces in Industrial Applications: Check or 
Inkjet Plotters, Electrojetting, Industrial 
Foams, and Rotary Bell Painting 


Maria Garzon, Robert I. Saye, and James A. Sethian 


Abstract Moving interfaces are key components of many dynamic industrial pro- 
cesses, in which complex interface physics determine much of the underlying action 
and performance. Level set methods, and their descendents, have been valuable in 
providing robust mathematical formulations and numerical algorithms for track- 
ing the dynamics of these evolving interfaces. In manufacturing applications, these 
methods have shed light on a variety of industrial processes, including the design 
of industrial inkjet plotters, the mechanics of electrojetting, shape and evolution in 
industrial foams, and rotary bell devices in automotive painting. In this review, we 
discuss some of those applications, illustrating shared algorithmic challenges, and 
show how to tailor these methods to meet those challenges. 


Moving interfaces are key components of many dynamic industrial processes, 
whose dynamics are critical to the underlying physics. Examples include turbines, 
flames and combustion, plastic injection molding, microfluids, and pumping. In each 
of these examples, complex physics at the interface, such as between a fluid and a 
moving wall, or through a membrane or a transition region, determines much of the 
underlying action and performance (Fig. 1). 

One approach to propagating interfaces is given by “level set methods”. These 
algorithms to track interfaces in multiple dimensions, couple the driving physics 
with the interface in a natural way, and smoothly handle topological change due to 
merger and breaking. They accurately and robustly compute high order solutions 
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Turbines Combustion Dendrite evolution Fluid mixing 


Fig. 1 Examples of industrial interfaces 


to moving interface problems, and are easily discretized using standard techniques, 
such as finite difference, finite element, and discontinuous Galerkin methods. 

The paper is a review of the application of these methods to some industrial 
problems, and draws from multiple sources [10—17, 26-29, 42-44] to discuss the 
design of industrial inkjet plotters, jetting and electrojetting devices, industrial foams, 
and rotary bell spray devices. Rather than extensively focus on the equations or the 
algorithms, we provide an overview of the approaches, with an emphasis on the 
results. References are provided for more in-depth discussions. 


1 Modeling Interface Evolution Using Level Set Methods 


Level set methods, introduced in [19], have been used in a large number of appli- 
cations to track moving interfaces. They are based on both a general mathematical 
theory as well as a robust numerical methodology, which relies on exchanging the 
typical Lagrangian perspective on front propagation, in which the front is explic- 
itly tracked, for an Eulerian view in which the moving interface is embedded as 
a particular level set of a higher dimensional function posed in a fixed coordinate 
system. The motion of the interface corresponds to solving the evolution of this 
higher-dimensional function according to a Hamilton-Jacobi-type initial value par- 
tial differential equation. 

A brief summary is as follows. Consider a moving interface I (f), parameterized 
by N — 1 dimensions. We restrict ourselves to interfaces which are closed and simple, 
and separate the domain into an “inside” and an “outside”. We recast the problem 
by implicitly defining the moving interface T (t) propagating in N — 1 dimensions 
as the zero level set of the solution to an evolving level set function $(x, t), $ : 
RN x t > R, which satisfies a time-dependent partial differential equation. There 
are many ways to initialize this implicit function: one approach is to let d(x, t = 0) 
be the signed distance from the interface "(+ = 0), linking the interface to the zero 
level set. 

We assume that the underlying physics specifies a speed F normal to the interface 
at every point on the interface. Constructing this speed function typically involves 
solving complex physics both on and off the interface. 
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Thus, there are two embeddings. First, the interface itself is embedded and implic- 
itly defined through a higher-dimensional function @. Second, to move the other level 
sets, we embed the speed F in a higher-dimensional function, known in the literature 
as the “extension velocity” Fexr, which defaults to the given speed on the zero level 
set corresponding to the interface. 


1.1 Equations of Motion 


Here, we review the basic ideas behind the derivation and implementations of level 
set methods. We follow the derivation and discussion in [35, 36]. 

We wish to produce an Eulerian formulation for the motion of a hypersurface T 
representing the interface and propagating along its normal direction with speed F, 
where F can be a function of various arguments. Let +d(x) be the signed distance 
from the point x e R“ to the interface at time t = 0. Define a function $ (x, t = 0) 
by the equation 


p(x, t = 0) = +d(x). (1) 


By requiring that the zero level set of the evolving $ (see Fig. 2, left) always match 
the propagating hypersurface, means that 


$ (x (t), t) = 0. (2) 


Dyt) oj 


Fi 
E 


Ki 


Transformation of front motion 
into initial value problem. An 
implicitly defined surface 4, 
whose ensuing motion satisfies 
Equation 3, and whose zero level 
set always matches the motion of 
the interface. 


The level surface ¢ in red. Top: 
$ = 0 corresponds to two separate 
initial fronts. Bottom: Later in 
time: the interface topology has 
changed, yielding a single curve 
as the zero level set. 


Fig. 2 Left: Implicit embedding of level set function. Right: Topological change 
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By the chain rule, $; + Vo (x(t), t) - x'(t) = 0, Since x’(t)- n= Fexr, where n = 
V¢/|V¢| with extension velocity F.,;, this yields an evolution equation for $, 
namely, 

Qi + Fex:|VG| = 0, given (x,t — 0). (3) 


This is the level set equation introduced by Osher and Sethian [19]. Propagating 
fronts can develop shocks and rarefactions in the gradient, corresponding to corners 
and fans in the evolving interface, and numerical techniques designed for hyperbolic 
conservation laws can be exploited to construct schemes which produce the correct, 
physically reasonable entropy solution, see [32-34]. 

There are several advantages to this approach. First, the formulation works in 
any number of dimensions. Second, topological changes are handled without special 
attention: fronts split and merge. Third, geometric quantities along the interface can 
be calculated by taking advantage of the embedding and computing quantities in 
the fixed Eulerian setting. Fourth, this formulation naturally lends itself to numerical 
approximations, for example, through finite difference or finite element formulations 
on the fixed background mesh. 


1.2 Computational Advances 


Since its introduction, a large number of computational advances have been devel- 
oped to make this approach efficient, accurate, and economical. These include 


The introduction of adaptive, “narrow band level set methods” [1] which confine 
computation to a thin band around the zero level set. 

Fast methods to construct extension velocities [2, 25]. 

Incorporation of complex physics [3-5], transport of material quantities [6], and 
methods to handle multi-phase flows with a large number of distinct propagating 
regions coming together in complex junctions, triple points, etc. [26, 27]. 


A large number of reviews have been appeared over the years, containing these 
and many related ideas. We refer the interested reader to [20, 30, 35, 36, 38-40]. 


2 Industrial Printing 


2.1 Physical Problem and Modeling Goals 


Industrial inkjet printing involves ejecting ink housed in a well through a narrow 
nozzle, which is then deposited on a material. The ink in the bath is expelled by 
an electro-actuator mechanism at the bottom, which quickly propels ink through 
the nozzle. The shape of the nozzle, the force and timing of the actuator, and the 
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properties of the ink are instrumental in determining the ultimate shape, delivery, 
and performance of the printing device. 

This is a two-phase incompressible fluid flow problem, with the interface sepa- 
rating air and ink. Depending on the constituency of the ink, the flow can either be 
Newtonian or visco-elastic. Boundary conditions include both no-slip and no-flow at 
solid walls, and triple points where air-ink boundaries meets solid nozzle walls are 
subject to typical critical angle dynamics controlling slipping. While a common use 
for inkjet printers is in commercial home printing, over the past two decades a large 
number of sophisticated industrial applications have appeared, ranging from printing 
integrated circuits and the manufacture of display devices on through to construction 
of tissue scaffolding and layered manufacturing. 

The goal of numerical simulation is to identify and optimize key aspects of the 
process, including 


e Optimize the design of the nozzle and to control the actuator mechanism to aim, 
extend, and focus droplet delivery; 

e Characterize wall wetting/non-wetting on the shape and separation of droplets; 

e Determine and perhaps minimize the formation of secondary trailing droplets, 
which break off from the main ejected bubble as the fluid elongates, due to the 
effects of surface tension; and 

e Understand how variations in viscosities and impurities affect droplet dynamics. 


2.2 Equations of Motion and Computational Challenges 


We solve for incompressible flow in a non-rectangular geometry, with no-slip and 
no-flow on walls, with air satisfying Newtonian flow and ink satisfying a visco-elastic 
Oldroyd-B model. The equations of motion [42-44], are given by 


Du 
(Unk) pi — —--VpitV:QuiD)--V.ri, Vem =0, 
Dt (4) 
a — 25 p Dı) 
Di 1 1 1 1 pm 1 Hp) - 
: Du» 
(Air) pa T —Vp+V:QuiD), V:42=0. (5) 
1 T : 
D; = ¿[Vu +(VuyY], u; = uie, + vie, i=1,2 (6) 


where, for the ink, v; is the viscoelastic stress tensor, A; is the viscoelastic relaxation 
time, 1,5; is the solute dynamic viscosity and subscript 2 refers to (Newtonian) air. 
We use a level set method to track the air-ink interface, starting with the initial 
pressure disturbance in the reservoir: the fluid then moves through the nozzle and is 
then ejected into the ambient air, and then may separate into one or more droplets. We 
compute an approximate solution to the incompressible Navier-Stokes given above 
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Fig.3 Left: Experimental profiles, showing ejected ink and satellite formation; note the formation 
of the trailing satellite droplet as the initial bubble stretches, and changes topology. Right: simulation 
of full ejection cycle (taken from [43]). Inflow pressure from an equivalent circuit model which 
describes the cartridge, supply channel, vibration plate, PZT actuator, and applied voltage. Fluid 
is an Epson dye-based ink, with critical advancing 6, = 70° and receding 6, = 30? contact angle, 
and with o; = 1070 kg/m?, jjj = 3.34 x 107% kg/m s, and o = 0.032kg/s?. The nozzle geometry 
has diameter 26 microns at opening and 65jum at bottom 


in both phases simultaneously, with surface tension terms mollified to the right- 
hand-side as a forcing term. Thus, the solution accounts for both the ink velocity, 
the air-ink interface, and air currents induced in the air by the fluid ejection. We 
use a second order projection method [7-9] on a body-fitted logically rectangular 
mesh. Calculations are performed in both axi-symmetric two dimensions and full 
three-dimensional regimes. For details, see [42-44]. Figure3 shows the results of 
both an experiment and simulation. 


3 Droplet Formation and Electro-jetting 


3.1 Physical Problem and Modeling Goals 


A large number of industrial problems involve microjetting and droplet dynamics, 
in which small droplets both move through small structures and also transport key 
materials, for example, in such areas as deposition of evaporation substances, delivery 
of biological materials, and substance separation. 

Part of the challenge in computing these problems stems from the critical role of 
surface tension and shear forces, which often drive topological change, breakage, 
and merger in the evolving droplets. Level set methods, because of their ability to 
handle these structural changes, are particularly well-suited for computing droplet 
dynamics. Here, we summarize work on microjetting dynamics first presented in 
[10], see also [11—17]. 
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Consider the dynamics of a thin tube of fluid as it pinches off due to surface 
tensions effects at a narrowing neck of the fluid (see Fig.5), where mean curvature 
drives the interface inward until it breaks into two separate lobes of fluid. The pinch- 
off dynamics reveal considerable intricacy: as the droplet breaks, rapidly moving 
capillary waves on the surface cause instabilities and oscillations in the fluid lobes. 


3.2 Equations of Motion 


Following the arguments in [11, 12], we model the fluid as incompressible and 
irrotational with a potential flow formulation. Euler’s equation gives 


V.u=0 in Q(t) (7) 
=V 

u +u: Vu = UE + bodyforces on I',(s). (8) 
p 


Assuming irrotationality (V x u — 0), the problem can then be written in terms of a 
fluid velocity potential u — V y, namely 


a = 0 in Q(t) (9) 


Y, + sy Vw) q EB =0 onT.(s), (10) 


where p, is the atmospheric pressure and p is the fluid density. 
As shown in (11, 12], this can be reformulated as 


u— Vy in Q(t), Ay —0 in Q(t) (11) 
EN : Vw-V : : r 12 
p == = Et on (S) , (12) 


where Q(t) is the fluid tube, I; (s) is the boundary of the tube, R; and R» are the 
principle radii of curvature, and y is the surface tension. 

Although the potential y is only defined on the interface, our plan is to build 
an extension of both the potential and the interface to all of space, so that we can 
then employ the level set methodology. This embedded implicit formulation then 
allows calculation of the fluid interface motion through pinch off, and can compute 
dynamics of the split fluid lobes. 

These embeddings produce a new set of equations, namely 
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u- Vy in Q(t) i 


Aw(r, z) - 0 in Q(t) 
9; + Uext -V¢é=0 in Op 
G: + Uext: VG = fext in Qp 


For details about the derivation of these equations, see [10—12]. 


3.3 Computational Challenges 


The computational challenges that stem from these equations of motion lie in part on 
the delicate, sharp singularity at pinch off. The curvature becomes very large, and as 
soon as pinch off occurs, the two pieces of the neck retract very quickly. Constructing 
correct extension values for the velocity and the potential requires care as well. 

We solve these equations through a time-cycle. Given values for the embedded 
implicit potential and level set function on a fixed background mesh, we construct 
the zero level set corresponding to the interface, place boundary element nodes on 
that interface, and then employ a boundary element method to find the new potential 
and associated velocity field, suitably extended. These nodes are then discarded, and 
the discrete grid values for the level set function, potential, and velocity are updated. 


3.4 Example Results 


Extensive numerical experiments are given in [12, 14]: the self-similar behavior 
of some variables near pinch-off time is checked within the computations and the 
computed scaling exponents agree with experimental and theoretical reported values. 
Here we review those results. Figure4 shows a snapshot after pinch-off, revealing 
capillary surface waves on the undulating surface. Figure5 shows the fine-scale 
structure of droplet dynamics after pinch-off. 


3.5 Charged Droplet Separation 


The above situation becomes considerably more complicated when the droplets are 
electrically charged, in which the droplet motion is driven by a background electrical 
field. Applications include electrospray ionization, electrospinning to produce fibers 
by drawing charged threads of polymers, particle deposition for nanostructures, drug 
delivery systems, and electrostatic rotary bell painting. 
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Fig. 4 Droplet dynamics. Left, experiment taken from [41]. Right, level set calculation of surface 
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Fig. 5 Simulation: fine-scale structure of droplet dynamics after pinch-off [12] 


Fig. 6 Experimental profile of electrically charged droplet motion [18] 


The fundamental mechanism relies on the motion of an electrically conductive 
liquid in an electric field. The shape of the droplets starts to deform under the action 
of the electric field, afterwards the competition between inertial, surface tension and 
electric forces drives the dynamics, see Fig. 6. 
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Fig. 7 Equations for electrically charged droplet motion. Note: In the shown equations, the velocity 
potential is labelled Y but is labelled by V in the main text 


3.6 Equations of Motion and Computational Challenges 


The equations of motion are the previous potential formulation for droplet hydrody- 
namic motion, plus electrodynamics. We assume a perfectly conducting fluid and an 
unlimited dieletric exposed to an external uniform force field. Model equations from 
[16] are shown in Fig. 7. 

Algorithmic challenges include accurate and reliable computation of the electric 
field and handling sharp breakup and fast ejection. 


3.7 Example Results 


We show a numerical simulation [16] of a free charged droplet carrying a charge 
above the critical one, reproducing experimental results before and after jet emission. 
Figure 8 shows the focused droplet end from which charged tiny droplets are ejected. 


4 Industrial Foams 


4.1 Physical Problem and Modeling Goals 


Many problems involve the interaction of multiply-connected regions moving 
together. These include the mechanics and architecture of liquid foams, such as 
polyurethane and colloidal mixtures, and of solid foams, such as wood and bone. 
The industrial applications of these problems are manifold. Liquid foams are key 
ingredients in industrial manufacturing, used in fire retardants and in froth flotation 
for separating substances. Solidification of liquid foams results in solid foams, which 
have remarkably strong compressible strength because of their pore-like internal 
structure; and include lightweight bicycle helmets and automotive absorbers. 
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Fig. 8 Time evolution of electrically charged droplet motion, from [16] 
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Fig. 9 Examples of multiphase problems 


In such problems, multiple domains share walls meeting at multiple junctions. 
Boundaries move under forces which depend on both local and global geometric 
properties, such as surface tension and volume constraints, as well long-range phys- 
ical forces, including incompressible flow, membrane permeability, and elasticity. 

Foam modeling is made challenging by the vast range of space and time scales 
involved [6]. Consider an open, half-empty bottle of beer. It may seem that nothing 
is happening in the collection of interconnected bubbles near the top, but currents in 
the lamellae separating the air pockets show slow but steady drainage. It can take tens 
to hundreds of seconds for the lamellae fluid to drain and then rupture, triggering an 
lamella explosion that retracts at hundreds of centimeters a second, after which the 
imbalanced configuration rights itself to a new stable structure in less than a second. 
Spatially, membranes are barely micrometers thick, while large gas pockets can span 
many millimeters or centimeters. All told, the biggest and smallest scales differ by 
roughly six orders of magnitude in space and time. 

Another example comes from grain metal coarsening, in which surface energy, 
often associated with temperature changes, drives a system to larger structures. A 
third example comes from foam-foamed fiber networks, found in both industrial 
materials such as paper and biological materials, such as plant cells and tissues 
(Fig. 9). 
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In all of these engineering problems, understanding how such factors as pocket 
formation and distribution, tensile strengths, and foam architecture is a key part of 
producing mechanisms to optimize foam performance. 


4.2 Computational Challenges 


Producing good mathematical models and numerical algorithms that capture the 
motion of these interfaces is challenging, especially at junctions where multiple 
interfaces meet, and when topological connections change. Methods have been pro- 
posed, including front tracking, volume of fluid, variational, and level set methods. It 
has remained a challenge to robustly and accurately handle the wide range of possible 
motions of an evolving, highly complex, multiply-connected interface separating a 
large number of phases under time-resolved physics. 

The problem is exacerbated by the nature of the mathematical components that 
contribute to the dynamics, including: velocities dependent on such factors as cur- 
vature, normal directions and anisotropy; the solution of complex PDEs with jump 
conditions, source terms, and prescribed values at the interface and internal bound- 
ary conditions; area and volume-dependent integrals over phases; thermal effects and 
diffusion within phases; and balance of forces at complex junctions. 

From a numerical perspective, some of the challenges stem from the vast time and 
space scales involved. Using the same spatial resolution to resolve the physics along 
interfaces is often impractical in the bulk phases. Sharp resolution of the interface 
and front-driven physical quantities located on the interface is required as input to 
the bulk PDEs. Accurately resolving interface junctures is critical in order to provide 
reliable values for the balances of forces at junctions. 

All told, these lead to formidable numerical modeling challenges. 


4.3 Voronoi Implicit Interface Methods 


Voronoi Implicit Interfaces Methods (VIIM), introduced in [26], provide an accurate, 
robust, and reliable way to track multiphase physics and problems with a large number 
of collected, interacting phases. They work in any number of space dimensions, 
represent the complete phase structure by a single function value plus indicator at 
each discretized element of the computational domain, couple easily to complex 
physics, and handle topological change, merger, breakage, and phase extinction in 
a natural manner. The underlying equations of motion that represent the evolving 
interface and complex physics may be approximated in either a finite difference or 
finite element framework. These equations couple level set methods for an evolving 
initial value Hamilton-Jacobi-type partial differential equation to a computational 
geometry-based Eikonal equation to produce a faithful phase representation. Here, 
we provide a brief review of the methods. For details, see [26]. 
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The starting point is to consider a collection of non-overlapping phases which 
divide up the domain. The “interface” consists of places where these phases meet. In 
two dimensions, the simplest example is a single curve separating two phases. More 
complex structures might have multiple closed curves, each surrounding a separate 
phase, which meet in triple points or higher-order junctions. In three dimensions, the 
situation is far more complex. 

The Voronoi Implicit Interface Method begins by characterizing the entire system 
through an implicit representation. For each point x in the plane, define $ (x) as 
the distance to the closest interface. Additionally, define x (x) as an integer-valued 
function which indicates the phase. By construction, the interface representing all 
possible boundaries is given as the zero level set (6$ (x) = 0} of this unsigned distance 
function, and the indicator function reveals the type of phase. 

Thus, for example, if $ (x) = 5 and x (x) = 4, then we know that the point x is 
located in phase 4, and the closest interface point is located a distance 5 away. 

Starting with this unsigned distance function representation, we execute a two-step 
process. With interface speed F in the normal direction: 


e Advance ¢ through k time steps using the standard level set methodology. That is, 
produce $"*! from $" by solving a discrete approximation to 


$, + F|V$| — 0. 


e Use the e level sets of this time-advanced solution to reconstruct a new unsigned 
distance function. This is done by first computing the Voronoi interface from the € 
level sets: this corresponds to the set of all points equidistant from at least two of the 
€ level sets from different phases, and closer to any of the non-equidistant phases. 
This Voronoi interface is then used to rebuild the unsigned distance function. 


These two steps give the method its name: “Implicit Interface" because of the level 
set step for the time evolution, and “Voronoi” because of the reconstruction step used 
to rebuild the unsigned distance function and characteristic indicator function. 

There are several things to note: 


e The method works because of a comparison principle which, for a large fraction 
of physically reasonable flows built through the use of extension velocities (see 
[2]), keeps the zero level set trapped between the neighboring e level sets. These 
€ level sets may be updated for a short period of time without suffering from the 
influence of the non-smooth ridge along the zero level set. 

e The Voronoi reconstruction can be accomplished without explicit construction 
through two applications of fast Eikonal solvers [25, 37]. 

e Regions spontaneously disappear (appear) if they become small (large) enough so 
that an e-level set does not exist (can be constructed). 

e Careful numerical algorithms can be devised to allow for any non-negative value 
for e, including e = OT. 


For details, see [26, 27]. 
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Fig. 10 Collapse of a foam cluster, visualized with thin-film interference taken from [29] 


4.4 Application of VIIM to Foam Dynamics 


Here, we review some current work applying VIIM to tracking the evolution of liquid 
foams. The vast time and space scales mean that one cannot compute over all scales 
simultaneously. Instead, we use a scale-separation model which allows us to divide 
the foam physics into three distinct stages. 
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We characterize the foam structures as represented by thin, interconnected mem- 
branes (lamellae) each surrounding pockets of air, and containing fluid. Membranes 
can share common walls, and fluid in each lamella drains toward common, shared 
Plateau borders that form a network of triple junctions and quadruple points. This 
drainage is slow, and once a membrane becomes too thin, it ruptures, causing the 
large air pockets to be out of macroscopic balance, which then readjust according to 
the equations of incompressible flow driven by interfacial forces along the lamellae. 

These events can be thought of as taking places over different scales. The macro- 
scopic air-fluid incompressible flow phase takes place over the whole domain, and 
evolves to an equilibrium relatively quickly. The lamellae drainage phase is slow, but 
takes place only over the very thin membrane walls. Rupture occurs very quickly. 

In [28, 29], these three phases were used to develop a mathematical model and 
numerical simulation framework for foam evolution. During the macroscopic phase, 
a second order projection method is used to solve the incompressible Navier-Stokes 
equations on a rectangular mesh, with the interface smoothing its influence to the 
right-hand side through a mollified surface tension term. The individual lamellae are 
advanced under the incompressible flow by the Voronoi Implicit Interface Method, 
with the internal liquid transported by the method of characteristics. When the motion 
is almost gone, the model enters a different phase and assumes that the multi-phase 
configuration has essentially reached equilibrium; a fourth order PDE is then solved 
for thin film drainage, approximated through a discretized finite element triangula- 
tion. The final phase results from membrane rupture, idealized as an instantaneous 
disappearance of a lamella when a user-chosen minimal thickness is reached, which 
then redistributes the lamella liquid mass and sends the configuration into macro- 
scopic disequilibrium. 


4.5 Example Results 


An example of the complete dynamics developed in the multi-scale foam model 
is shown in Fig. 10, which shows the time evolution of a bubble cluster, starting 
from 26 separate bubbles and ending up in a single bubble. The bubble colors are 
computed from thin film interference determined by the computed fluid thickness in 
the lamellae. 


5 Rotary Bell Painting in the Automotive Industry 


In manufacturing settings, paints are frequently applied by an electrostatic rotary 
bell atomizer. Paint flows to a cup rotating at 10,000—70,000 rpm and is driven by 
centrifugal forces to form thin sheets and tendrils at the cup edge, where it then tears 
apart into dispersed droplets. Vortical structures generated by shaping air currents 
are key to shearing these sheets and transporting paint droplets. Advantages of this 
manufacturing process include the ability to paint at high volume and to achieve 
uniform consistency in the paint application. 
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Schematic of paint flow and air currents [21] in rotary bell atomizing applications 
Understanding the generation, size distribution, delivery, and adhesion of these 
paint droplets is a problem of considerable importance. For example, (a) much of the 
energy involved in automotive assembly is associated with the paint process; (b) a 
significant amount of paint does not attach to the cars and ends up as pollutants; and 
(c) 10-20% of automobiles need to be repainted due to aberrations in the process. 
The goal of computational modeling of the rotary bell delivery system includes 


e Optimizing the atomization process for higher paint flow rates to obtain more 
uniform and consistent atomization in the 30,000 to 60,000 rpm range. 

e Studying the atomization process as a function of paint fluid properties (such as 
density, viscosity, and surface tension) and physical properties, such as inflow 
rates, bell rotation speeds and shaping air currents. 

e Analyzing film dynamics, particularly in the immediate atomization zone adjacent 
to the cup edge, including the dynamics of filament formation and droplet size and 
distribution and their trajectories. 


5.1 Computational Challenges 


The computational challenges posed by the painting delivery mechanism are 
formidable. The range of physical parameters is substantial. The droplet size ranges 
from 5 to 100 um, the films are 10-50 jum thick, while the rotary bell diameter 
is on the order of centimeters. The cup rotates at 200 m/s, droplets breakup over 
microseconds, whereas droplet statistics requires milliseconds. As such, modeling 
requires tracking droplets across a wide range of length scales, paint fluid mechanics 
is subject to high centrifugal and Coriolis forces, and the impact of highly vortical 
air structures on film sheeting requires careful resolution. 
From a computational point of view, these translate into daunting challenges: 


e Interfaces are very contorted and complex. 

e Very thin sheets of paint roll off, and then break into droplets. 

e Fluid dynamics is highly three-dimensional with gas eddies playing a key role. 

e Droplets are tiny, and break off and subsequently merge in highly complex ways. 
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e Mass conservation is important: tracking and accurately accounting for small 
droplets is critical, since all the paint ultimately breaks into such small objects. 


These translate into several modeling/mathematical/algorithmic/numerical chal- 
lenges which must be tackled in order to build a workable approach, including: 


e High-order accurate fluid solvers and sharp interface physics: The standard level 
set approach to tracking two- or multi-phase fluid problems is to solve both the 
evolving level set equation and the Navier-Stokes equations on a background fixed 
mesh, smearing forces jump conditions, and discontinuities across the air/fluid 
interfaces through mollified delta functions into forcing terms on the background 
mesh. Because the droplets are so small, and because the viscosity/density jumps 
are so large, this approach is too inaccurate. Instead, we need to employ incom- 
pressible Navier-Stokes solvers that allow us to represent these forces sharply, by 
using implicitly defined meshes that adapt to the moving geometry of the liquid-gas 
interface. 

e Develop hybrid interface solvers coupled to high order fluid solvers. Coupling 
these high-order fluid solvers to the interface dynamics requires building accurate 
methods to allow information transfer between the background Cartesian level set 
mesh and the unstructured interface-fitted mesh. 

e Non-Newtonian fluids: Another complex challenge stems from the fact that paint is 
in fact non-Newtonian. One must carefully design and embed experimental shear 
stress models inside numerical calculations. 

e Mesh adaptivity: In order to capture the shaping air currents and spinning bell, 
which occupy large length scales, as well as the smallest scales of droplets and 
thin films, we need to employ aggressive adaptive mesh refinement strategies. 

e Multi-core high performance computing:This is an involved calculation, requir- 
ing small time steps, many mesh elements, and highly accurate elliptic solvers. 
Attention must be paid to parallel implementations on sophisticated computing 
architectures. 


5.2 Level Set Methods and High-Order Multiphase Flow 


The central problem in applying level set methods is that the equations of motion 
need to include jump conditions at the air-paint interface, e.g., droplet boundaries. 
The usual level set approach of “smearing” forces to a background mesh in order to 
provide source terms to the incompressible Navier-Stokes equations is problematic. 
The droplets can be so small, and the density/viscosity jumps so large and sharp, that 
this mollified approach does not provide the required accuracy. 

Instead, we make use of an algorithmic technology building on implicitly-defined 
meshes [22-24]. There are several ideas at work in this approach: 


e First, two-phase incompressible flow is solved using a discontinuous Galerkin 
(DG) approach, with a level set method used to track paint-air interfaces. 
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Fig. 11 Implicitly defined meshes using multi-phase cell merging. Left: Phase cells, defined by 
the intersection of each phase (blue and green) with the cells of a background Cartesian/quadtree 
grid, are classified according to whether they fall entirely within one phase, entirely outside the 
domain, or according to whether they have a small or large volume fraction. Right: Small cells 
are merged with neighboring cells in the same phase to form a finite element mesh composed of 
standard rectangular elements and elements with curved, implicitly defined boundaries. Figures 
adapted from [23, 24] 


e The level set method is solved using finite differences on a fixed background mesh 
in a time-evolving narrow-banded data structure. 

e The zero level set corresponding to the paint-air boundaries, which cuts through 
the cells of a background octree grid, is used to drive a cell-merging procedure 
which creates an implicitly-defined mesh, whose element shapes exactly coincide 
with the curved geometry of the interface; see Fig. 11. 

e This mesh is used to accurately incorporate the now body-aligned interface jump 
conditions in the DG solver. 


Adaptivity: The next issue stems from the fact that there is a wide range of physical 
space scales involved in the process. The paint comes off the bell as a very thin 
film, and then breaks into small bubbles; as such, computing on a uniform mesh is 
impractical. Instead, we employ adaptively refined meshes wherein the mesh reso- 
lution adapts to such triggers as: (a) the distance to liquid-gas interface; (b) amount 
of curvature of interface; (c) the thickness of droplets, tendrils, films; and (d) the 
proximity to bell cup. See, for example, Fig. 12. 


High performance computing: The above calculations are complex and the time 
step, spatial resolution, and physics make it impossible to model the entire bell. 
With a numerical framework targeting high performance computing facilities, using 
massively parallel MPI and OpenMP techniques, we can conduct high-resolution in- 
depth studies of rotary bell atomization on small wedges, about 5 degrees in angle, 
using tens of thousands of cores. In Fig. 13 we present one result from a large family 
of parameter studies. For further details, see [31]. 
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Fig. 13 Three-dimensional model results of rotary bell atomization for time- and spatially-varying 
inflow film thickness, high mesh resolution, and shaping air currents simulating nozzle inlets. In 
each of the nine panels, two viewpoints at the same time frame are given: a top-down perspective 
and a side-on view to show the vertical drifting of the shedding droplets, being pushed upwards by 
the shaping air currents. The liquid surface is colored copper, with the bell cup situated beneath 
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6 Conclusions and Summary 


We have tried to review a few examples in which the interface dynamics are a 
profound contributor to the efficiency of the industrial processes, and have focused 
on the application of level set methods for interface tracking to these problems. We 
have considered only a few contributions and works, and refer the interested reader 
to the referenced review articles. 
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Abstract Numerical simulations for blood flows related to cardiovascular diseases 
are presented. Differences in vessel morphologies produce different flow character- 
istics, stress distributions, and ultimately different outcomes. Some examples illus- 
trating the effects of curvature and torsion on blood flows are presented both for 
simplified and patient-specific simulations. The goal of this study is to understand 
relationships between geometrical characteristics of blood vessels and blood flow 
behaviors. 


1 Introduction 


In aging societies, cardiovascular conditions such as aortic aneurysms and aortic 
dissections persist as life-threatening diseases. Moreover, congenital diseases such 
as hypoplastic left heart syndrome constitute an important issue for our society. In 
recent years, patient-specific simulations have become common in the biomedical 
engineering field. Several mathematical viewpoints are expected to be added and to 
play important roles in this context. For instance, geometrical characterization of 
blood vessels, which vary widely among individuals, provides useful information to 
medical sciences. Differences in blood vessel morphology give rise to different flow 
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characteristics, which cause different stress distributions and outcomes. Therefore, 
characterization of these vessels’ respective morphologies represents an important 
clinical question. Our objective in this study is to understand possible mechanisms 
connecting geometrical characteristics and stress distributions through flow behav- 
iors. The studies presented in this paper are parts of a CREST [1] framework sup- 
ported by the Japan Science and Technology Agency in a strategic area for promoting 
collaboration between mathematical science and other scientific fields. 


2 Numerical Methods and Results 


2.1 Governing Equations 


We adopted incompressible Navier-Stokes equations as governing equations. 


Qui Qui 1 Op 0 E 2 
Tus = +v + ; 
Ey J 0x; p OX; 0x; Ox; OX; in Ox (0, T) . (1) 
zog ns 
OX; m 


In those equations, t, u; (i = 1, 2, 3), p, p, and v respectively represent time, veloc- 
ity, pressure, density, and the kinematic viscosity of blood. We assumed that blood 
can be regarded as a Newtonian fluid in large arteries. Several numerical results with 
different numerical methods are presented in the following subsections. Finite dif- 
ference method is used in Sect. 2.2, applied for blood flows in a thoracic aorta and for 
flows in simple spiral tubes to examine torsion effects. Then, finite element method 
is applied in Sect. 2.3 where fluid structure interaction (FSI) is considered and some 
flow mechanisms in a configuration after Norwood surgery are examined. 


2.2 Finite Difference Approximation 


2.2.1 Visualization of Flows in a Thoracic Aorta 


Effects of curvature on flows in curved tubes have been discussed extensively in 
earlier studies [2-4]. When a tube has curvature, centrifugal force acts in the opposite 
direction, depending on the axial component of the velocity. Subsequently, secondary 
flow occurs on the cross-section and forms a set of twin vortices called Dean's 
vortices, thereby playing an important role in blood flow through the aortic arch 
where a strong curvature exists. 

Figure | presents streamlines that can be visualized based on numerical results 
obtained through an earlier study [5]. We assumed a blood vessel as a rigid body and 
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applied finite-difference method on a centerline-fitted curvilinear coordinate system, 
where the centerlines and cross-sections were extracted from patient-specific CT 
scans of patients with aortic aneurysms. Incompressible Navier-Stokes equations 
were solved numerically with a boundary condition for the inflow velocity profile 
given by a phase-contrast MRI measurement. 

Figure la presents streamlines through the whole thoracic aorta at peak systolic 
phase. Circulation in the aneurysm is apparent. Figure 1b shows the Dean's vortices 
on the aortic arch superimposed to the main axial flow. In Fig. 1c, a spiral flow is 
apparent in the descending aorta. 

Helicity, u - (V x u), represents swirling flow regions of opposite signs. Figure 2a 
depicts helicity isosurfaces of a positive and a negative values, which shows Dean’s 
vortices generated at the aortic arch and subsequently flowing down to the descending 
aorta. In Fig. 2b, an isosurface of the second largest eigenvalue A, of S? + Q?, where 
S and Q respectively represent symmetric and antisymmetric parts of the velocity 
gradient tensor, also shows a swirling flow region [6]. Enstrophy, |V x u|*, exhibits 
the strength of vorticity in Fig. 2c. In Fig. 2b, c, colors of isosurfaces show Az values. 


2.2.2 Effects of Torsion in Simple Spiral Tubes 


We also examined the effects of torsion using a pulsating flow in simple spiral tubes, 
as shown in [5]. Torsion of a three-dimensional curve is defined through the Frenet- 
Serret formula shown below. 


(a) 


Fig. 1 Instantaneous streamlines 
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(a) Helicity (b) 4» (c) Enstrophy 


Fig. 2 Several fluid dynamics quantities 


peak systolic phase late systolic phase late diastolic phase 


Fig. 3 Secondary flows in a zero-torsion tube 


d t 0 x0 t 
—|n}=|-x Or n |. (2) 
ds Vy o-roJ lb 


Therein, x and t respectively represent curvature and torsion, where t, n, and b 
respectively denote the tangential, normal, and bi-normal vectors. 

Figures 3 and 4 portray secondary flows, which are obtainable by subtracting the 
main axial flow from the total flow velocities at peak systolic, late systolic, and 
late diastolic phases, respectively, for zero-torsion and nonzero-torsion cases. When 
the torsion is zero, the secondary flow is invariably symmetric. However, when the 
torsion is not Zero, merging phenomena occur; one large vortex persists in a diastolic 
phase. Such difference brings about differences in torque exerted on vessel walls. 
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peak systolic phase late systolic phase late diastolic phase 


Fig. 4 Secondary flows in a nonzero-torsion tube 


2.3 Finite Element Approximation 


2.3.1 Torsion Effects on Flows in the Thoracic Aorta 


Next we consider fluid-structure interaction (FSI) to examine torsion effects using 
patient-specific morphologies [7]. Here, FSI analysis is handled with the Sequentially- 
Coupled Arterial FSI (SCAFSTI) technique [8] because the class of an FSI prob- 
lem here has temporally—periodic FSI dynamics. Fluid mechanics equations are 
solved using Space-Time Variational Multiscale (ST-VMS) method [9-11]. First, 
we carry out structural mechanics computation to assess arterial deformation under 
an observed blood pressure profile in a cardiac cycle. Then we apply fluid mechan- 
ics computation over a mesh that moves to follow the lumen as the artery deforms. 
These steps are iterated where the stress obtained in fluid mechanics computation 
is used for the next structural mechanics computation. To assess torsion effects, the 
torsion-free model geometry is generated by projecting the original centerline to its 
averaged plane of curvature, as presented in Fig. 5. 

Figure 6 presents secondary flows. On the left-hand side (projected shape), sym- 
metric Dean’s vortices are apparent, although they are not visible on the right-hand 
side (original shape), similarly to the simple spiral tubes in Fig. 4. 

Next we compare the wall shear stresses (WSS) patterns corresponding to the 
projected and the original geometries to examine the influence of torsion. Figure 7 
presents WSS at peak systolic phase. In the projected torsion-free shape, a high 
WSS region is apparent at the aortic arch, which results from the strong Dean’s twin 
vortices, although it is not apparent in the original shape with torsion there. 


2.3.2 Flow Mechanism in Morphology After Norwood Surgery 


This subsection presents examples of patient-specific blood flow simulations at an 
anastomosis site after Norwood surgery for hypoplastic left heart syndrome. Our 
target is the geometry surrounding an anastomosis site of the aortic arch and pul- 
monary artery after Norwood surgery, which is one step taken during surgeries for 
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Fig. 5 Projected and original shapes 


(a) Projected shape (b) Original shape 


Fig. 6 Secondary flows in projected and original shapes 


hypoplastic left heart syndrome. The target geometry was extracted from a CT scan 
with boundary conditions obtained from ultrasound measurements. Here, we again 
adopt the rigid body assumption, i.e., not considering fluid-structure interactions. 
The SUPG/PSPG stabilized finite element formulation is used, which is solved on 
P1/P1 elements. 

Figure 8a portrays instantaneous streamlines at the peak systolic phase, whereas 
Fig. 8b depicts the energy-dissipation distribution. Energy dissipation is a clinically 
important quantity because it imposes a load on the heart directly [12]. In Fig. 8b, 
high energy dissipation is apparent at the anastomosis site, which can be understood 
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Fig. 7 Wall shear stresses at 
peak systolic phase 


(a) Projected shape (b) Original shape 


(a) Streamlines (b) Energy dissipation 


Fig. 8 Streamlines at an anastomosis site after Norwood surgery 


straightforwardly because the velocity is extremely high there. Although high energy 
dissipation is also apparent in the descending aorta, it cannot be qualified straightfor- 
wardly. This dissipation apparently derives from spiral flow there, which is generated 
at the aortic arch immediately after blood passes out of the thin anastomosis channel, 
as shown in Fig. 9. Here, a relation can be found between morphology and energy 
dissipation patterns through flow structures. 
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Fig. 9 Front and back views of streamlines 


3 Conclusions 


We have presented some relations between geometrical characteristics of blood ves- 
sels and flow behaviors. Those relations are expected to explain how and why vessel 
morphologies affect WSS distributions and energy dissipations. As described in Sect. 
2.2, vessel curvature induces Dean’s vortices as a secondary flow by centrifugal force, 
thereby creating strong WSS there. Moreover, Dean’s vortices show different behav- 
iors depending on the existence of torsion. In the example from a Norwood surgery 
morphology, an energy dissipation pattern on the descending aorta can be explained 
through flow structures. As a next step, predictions based on geometrical characteris- 
tics of blood vessels are expected to contribute to better risk assessments and surgery 
planning through mathematical modellings and numerical simulations. 
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Haitao Leng, Dong Wang, Huangxin Chen, and Xiao-Ping Wang 


Abstract We develop an efficient iterative thresholding method for topology opti- 
mization for the Navier-Stokes flow. The method is proposed to minimize an objec- 
tive energy functional which consists of the potential power in the fluid and a fluid- 
solid interface perimeter penalization. The perimeter is approximated by a nonlocal 
energy, subject to a fluid volume constraint and the incompressible Navier-Stokes 
equation. The method is an iterative scheme which alternates two steps: (1) solving a 
system containing the Brinkman equation and an adjoint system, and (2) convolution 
and thresholding. Various numerical experiments in both two and three dimensions 
are given to show the performance of the proposed method. 


1 Introduction 


Topology optimization was originally developed for the optimal design in structural 
mechanics ([3, 4, 6]). Nowadays it has attracted much attention due to its wide appli- 
cation in the fields of industry problems such as optimization of transport vehicles, 
biomechanical structure, etc. So far, the density method [5, 31] has been well devel- 
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oped for implementation of topology optimization. It was originally developed for 
the design of stiffness and compliant mechanism [32, 33] and has been applied in 
various physical problems such as acoustics, electromagnetics, fluid flow, and ther- 
mal problems [7, 11, 15, 24, 34]. In fluid mechanics, the concept of density method 
was first developed by Borrvall and Petersson [7] for topology optimization for the 
Stokes flow. Then it was extended to the Darcy-Stokes flow [21, 43], the Navier- 
Stokes flow [12, 18, 20, 27, 36, 47], the non-Newtonian flow [30], the turbulent 
flow [13], and more complicated fluidic devices [1, 25, 26]. Approaches using the 
topological sensitivity analysis (providing an asymptotic expansion of a shape func- 
tion with respect to the size of a small inclusion inserted inside the domain) can 
also be used for shape optimization for Stokes flows [22] and Navier-Stokes flows 
[2]. Generally, the discrete optimization problem for the topology optimization was 
solved by the method of moving asymptotes (MMA) [35], level set based methods 
[8, 36, 47] and phase field based methods [18]. 

The threshold dynamics method developed by Merriman, Bence and Osher 
(MBO) [23] is an efficient method for approximating the mean curvature flow. In 
this method, the interface is implicitly represented by the characteristic functions of 
the domains. It alternates two simple steps: convolution between the characteristic 
functions and a heat kernel and point-wise thresholding. Recently, Esedoglu and Otto 
generalized the original MBO method to multiphase problems with arbitrary surface 
tensions [17]. The method has attracted much attention and it has been extended to 
many other applications, such as image processing [16, 37, 39], wetting dynamics 
[38, 44, 45], and target-valued problems [28, 29, 40-42]. 

In this paper we extend the iterative thresholding method developed in [9] to 
topology optimization for the Navier-Stokes flow. The porous medium approach 
based on the density method is utilized in the algorithm, and a Darcy term is intro- 
duced into the Navier-Stokes equation to “interpolate” between the Navier-Stokes 
equation in the fluid region and the Darcy flow through a porous medium (a weak- 
ened solid region with low permeability) (1.e., Brinkman equation). Then the total 
energy consists of the potential power in the fluid, the perimeter regularization, and 
a Darcy term. The perimeter term is computed based on the convolution between 
the heat kernel and the characteristic functions of regions. There are two steps per 
iteration in the proposed algorithm. The first step is to solve the Brinkman equation 
and an adjoint system, which can both be efficiently solved using the mixed finite 
element method. The second step is to update the fluid-solid regions by a simple 
convolution and thresholding step. The convolution can be efficiently computed on a 
uniform grid by the fast Fourier transform (FFT) with the computational complexity 
O(N log N). A variety of numerical experiments in both two and three dimensions 
are shown to verify the efficiency of the proposed algorithm. In addition, numerical 
results indicate that the total energy decays. 

The paper is organized as follows. In Sect.2, we introduce the mathematical 
model, the approximation to the model, and the derivation of the iterative thresh- 
olding method. The numerical implementation is discussed in Sect.3. We verify 
the performance through extensive numerical experiments in Sect. 4. We draw some 
conclusions in Sect. 5. 
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2 Derivation of the Method 


2.1 The Mathematical Model 


In this section, we consider the mathematical model for topology optimization for 
the Navier-Stokes flow. Denote Q € R? (d = 2,3) as the computational domain 
which is fixed throughout optimization and assume that Q is a bounded Lipschitz 
domain with an outer unit normal n such that R? V Q is connected. Furthermore, 
we denote Nọ C Q as the domain of the fluid which is a Caccioppoli set whose 
boundary is measurable and has a (at least locally) finite measure and Q X Qo as the 
domain of solid. Our goal is to determine an optimal shape of (2, that minimizes the 
following objective functional consisting of the total potential power and a perimeter 
regularization term, 


min Jo(So.u) = | zivu'ax + yir () 
(Qo, u) 2 
Q 
subject to 

V.u=0, in Qo, (2a) 
(u-V)u+ Vp—V-(uVu) = 0, in Qo, (2b) 
u=0, inQ\ Qo and on do, (Qc) 
Ulag = up, on dQ, (2d) 
|Qo| = 6|Q| with a fixed parameter £ € (0, 1). (2e) 


Here, u : Q > Rf, jis the viscosity of the fluid, p is the pressure, up : 92 > IR? 
is a given function, |I | is the perimeter of the boundary (i.e., T = dQ), and y > 0 
is a weighting parameter. 


2.2 The Relaxation and Approximation of the Problem 


Since the goal is to minimize the objective functional (1) subject to several constraints 
(2) with respect to the fluid-solid interface, it is necessary to have a proper represen- 
tation of the fluid-solid interface. Motivated by [9, 17, 37, 44], in this paper, we use 
the characteristic function x, of the fluid domain (1.e., 29) to implicitly represent the 
fluid-solid interface, 1.e., 


l, if X€ Qo, 


x) := 
x 0, otherwise. 
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X2(x) = 1 — xı (xX) is denoted as the characteristic function of Q X Qo. Then, the 


interface I" is implicitly represented by x; and x». Under this representation, |T | can 
be approximated by 


T VIA 
ris JZ fient o JZ [6c a - xod 6) 
Q Q 


|x|? 
4c 


1 
where G,(x) = 7 exp ( 
(Az 1)? 


denotes the convolution [17]. 

Similar to [9], to avoid solving the Navier-Stokes equation in a changing domain at 
each iteration, the porous medium approach [18] is utilized to “interpolate” between 
the Navier-Stokes equation in the fluid region (i.e., {x| xi(x) = 1}) and u = 0 in 
the solid region (i.e., {x| x2(x) = 1}) by introducing an additional penalization term, 
a(x)u, as follows: 


) (d = 2,3) is the Gaussian kernel and x 


V-u=0, in Q, (4a) 
(u-Vju+Vp—V-(uVu)+a(xju=0, in Q, (4b) 
Ulag = up, on 0%, (4c) 
[max = pia. (4d) 
Q 


Accordingly, the original objective functional (1) can be approximated by adding 
a Darcy penalty term as follows: 


room 7 f (kivut Siu) xev [7 f 6 a oax (5) 
Q Q 


where x denotes the characteristic function of the solid domain, i.e., x = x». 

Now, we discuss the computation of o in the current representation of the interface 
(i.e., using characteristic functions). Theoretically, œ should be large enough in the 
solid domain to penalize the condition u = 0 and close to 0 in the fluid domain to 
make u satisfy the Navier-Stokes equation. For numerical considerations, we relax 
a to a smooth function which undergoes rapid changes through the interface. We use 
the 0.5 level set of o = G, * x to approximate the position of the interface F and 
such y is a smooth function between [0, 1] and admits a change from 0 to 1 in an 
O (A/1) transition region. Thus, we compute o by 


a(x) = ap =aG, * X (6) 
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where a is a sufficiently large constant, and thus by the porous medium approach we 
can solve the system (4) in a fixed domain Q2. 
Finally, using (6), we arrive in the following formulation of the problem: 


min J" (x, u) =f (Zivu? 5G. 10u? + y [5x6 *(1— o) dx (7) 
Q 


subject to 


x € B:- (x € BV(Q)| x(x) = (0, 1), a.e., and fa — x)dx = BIQ} (8a) 


Q 
V-u=0, in Q, (8b) 
(u-Vju+ Vp—- V. (u Vu) 4 (@G, * x)u —0, in Q, (8c) 
ulag = up, on dQ. (8d) 


2.3 Derivation of the Method 


In this section, we will derive an iterative scheme to find the approximate solution 
for (7) and (8). Denote 


U :={u € H'(Q)|V.u—0,ul;o — up] and V:= (ve A (Q)|V -v = 0}. 


To derive the first order necessary optimality conditions for a solution (x+, uz) of (7) 
and (8), we introduce the Lagrangian € : B x U x V > R by 


€'(x,u,ü) := AS Vú + (4G, * x)u - üdx 
Q 


where the pressure term is not shown because V - u = 0. The variational inequality 
is formally derived by 


dE” 
(Fee tsi) x = te) 20, Vx eB (9) 
x 


and the adjoint equation can be deduced by 
ó£* z 
ju Uo Ue, Us). Y =0, VWveV (10) 
u 


where (-, -) denotes the L?-inner product. 
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To be specific, assume (xr, ur) € B x U is a minimizer of (7) and (8), the fol- 
lowing inequality is fulfilled: 


(56. * DAEA Su * (1 — 2x;) +aG, + (u, - Ur), x -x] z0,VxeB 
T 
(11) 


where ù, is the solution to the following adjoint system at (Ur, xz): 


— (u, - V)u, — (u - V)ü + (Vu) ü + Vp — V - (u Và) + (4G, * x.) = 0, 


(12a) 
V.ü-— 0, (12b) 
Alas =0. (12c) 


Here, p is the pressure associated to the adjoint system. 

Based on the first order necessary optimality condition, to solve (7) and (8), we use 
an iterative scheme to decrease the value of the objective functional with u satisfying 
(8) and ü satisfying (12). Without loss of generality, assume the k-th iteration x* is 
given, we compute (u^, ü^) via solving the following system 


V.u=0, 

V.ü-—0, 

u- V)u 4- Vp — V - (nu Vu) + (4G, x xu =f, 

—(u- V)u — (u - V)ü + (Vu)? ü + Vp — V - (UVa) + (4G, * xa = 0, 


Uloo = Up, 
Dag = 0. 
(13) 
After (u^, ü^) are solved from (13), rod is updated through 
xy! = arg min E" (x, u*, ü*). (14) 
xeB 


Write the objective functional €* (x , u*, ü^) into Evk( xX): 


S1,k Let k sk a k)2 TT 
E (XY) := E' (x, Ww) = y AG * lw | dx +y 2 XG- * (1 — x)dx 
Q Q 


+ Jus. x (u* dx + N (w*, a, 
Q 


An Iterative Thresholding Method for Topology ... 211 


where N (ut, ü^) contains all other terms in £'(x, u*, ü^) which are independent of 
x. The only problem now is to minimize £”*(x) on B, i.e., finding x**! such that 


xy*! =arg min E™* (x). (16) 
xeB 


We first relax (16) to a problem defined on a convex admissible set by finding r*+! 
such that 


r*+! = arg min £^ (r), (17) 
reH 
where H is the convex hull of 5: 


H :={r € BV(Q2)|r(x) € [0, 1] a.e., and f rax = Vo}. 
2 


The following lemma holds similarly as that in [9] and we refer the details of a similar 
proof to [9]. Thus, we can solve the relaxed problem (17) instead of (16). 


Lemma 2.1 Letu € HL, (Q, IR^) be a given function andr = (ri, r2). Then we have 


arg min ge (r) = arg min En (r). 
reH reB 


Next we show that (17) can be solved by a thresholding step. Because é Tk () is 
quadratic and concave in r, we first linearize the energy €**(r) at rf by 


Erk (py s EN + LYK — ry, 


where 


Et xil CH x (1 — 2r) + 130 x ju“? + raG, x (uk i) dx 
Q 


= f ródx 


Q 


where $ = yy FG; x (1—2rk) + £G, x [u^]? + @G, x (ut - ü^). Then (17) can be 
approximately solved by 


x +! = arg min £5; (r) = arg nip f ródx. (18) 
ren CU reH 
Q 


Then we have the following lemma as in [9] and one can also refer the details of 
proof to [9]. 
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Lemma 2.2 Let ó = y / 1G, * (1 — 2x") + ÈG, * |u*? + aG,  (u* - i) and 
Di' = {x € Q| b < ô} 
for some ô such that pr = (1 — B)|Q|. Then with x*+! = Xp, We have 
roe) < Pr for all > 0. 
The above lemma shows that (18) can be solved by 


xx) =1, if (x) « 8, 


x** (x) 20, otherwise, 


where ô is chosen as a constant such that fo x**'dx = (1 — B)|QI. 

To determine the value of $, one can treat Jo x dx — (1 — B)|Q| as a func- 
tion of ô (i.e., f (ô) = Jo x*t!dx — (1 — )|Q]) and use an iteration method (e.g., 
bisection method or Newton’s method) to find the root of f (6) = 0. For the uniform 
discretization of Q, a more efficient method is the quick-sort technique proposed in 
[44]. Assume we have a uniform discretization of £2 with grid size h, we can approx- 
imate fo x**!dx by mh? where m is the number of grid points where x**! = 1. 
Assume (1 — £)|Q| is approximated by Mh“, we then sort the values of $ in an 
ascending order and simply set x**! = 1 on the first M points. 

Now, we arrive at Algorithm 1. 


Remark 2.1 We remark here that it's obvious that the Step 2 in Algorithm 1 
decreases the energy which can be proved similar as we did in [9], i.e., 


J (xt, ut) < TG. 
In the Step 1, we don’t have 
J (xt, u) < e 


because this step can be interpreted as a projection step. It could increase the value of 
the energy. However, in the numerical experiments in Sect. 4, we checked the energy 
curves for all examples as displayed. All of them indicate that the algorithm has the 
energy decaying property. 


Remark 2.2 In the implementation, the stopping criteria is x*+! = x^ on each grid 
point. Itis easy to see that the stationary solution (obtained from Algorithm 1) satisfies 
the first order necessary optimality condition (8), (9), and (10). 


An Iterative Thresholding Method for Topology ... 213 


Algorithm 1 An iterative thresholding method for topology optimization for the 
Navier-Stokes flow 


Input: Discretize Q uniformly into a grid 7; with grid size h and set M = (1 — B)|Q|/h7. Set 
t > 0,a@ > 0, k = 0, a tolerance parameter tol > 0 and give the initial guess x? e B. 


Iterative solution: 
Step 1. Given x^, update u and i. Solve the following system 
V.u=0, 
V-u=0, 
(u- V)u + Vp — V - (u Vu) + (@G; * xu =f, 
(u- V)u — (u- V)ü + (Vu)? + Vp — V - (uVù) + (4G; * x )ü = 0, 
uloo = up, 
Ulaq = 0. 


to obtain u* and ü*. 
Step 2. Update x. Evaluate 


=y Zea 2x) 4 S Gr * hil +aG, s (uk ah), 
T 


sort the values of in an ascending order, and set x*+! = 1 on the first M points. 
Step 3. Compute ek = |x" xl. If ek < tol, stop the iteration and go to the output step. 
Otherwise, let k + 1 — k and continue the iteration. 


Output: (x, u) that approximately solves (7) subject to (8)(a-d). 


3 Numerical Implementation 


Now we illustrate the implementation of Algorithm 1 and we focus on Step 1. The 
Navier-Stokes equations with a Dacry term penalty and the adjoint problem (13) 
are solved by the mixed finite element method, and the standard Taylor-Hood finite 
element space is used for discretization. Let 7, be a uniform grid of the domain Q, 
and AV, is the set of all vertices of 7;. For a given x, € Ba where B, is the discrete 
version of B defined on M}. We introduce the Taylor-Hood finite element space 


V, = (ve H'(Q, RÔ) | vix € [P(K)l^, K € Th}, 


On := [q € L'(Q, R) | fuax=o, qlr € P\(K), K € Tr}. 
Q 


Let V? := {v € Va | vlag = u^), where u% is the a suitable approximation of the 
Dirichlet boundary condition up on the boundary edges/faces of 7;, . For the solution 
of (13), find (u;, pr) € V? X Qn such that 


(Qu, - Vur, v5) — (pu. V - Vn) + (HV Un, Vv5) + (Ahuh. Va) =0, V vy € Vj. 
(V uj. qn) = 0, V qn € Oh. 
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and (ü,, Pn) € y? x Qn such that 


—((uj - V); v5) + (Vu) Ùn, Vn) — (Ph, V + Vn) + (UV, Vyn) + (a (Xn )Ùn, Vn) 
=((u,-V)un, Vh), Yva € V), 
(V - Un, qn) = 9, V qn € On, 


where y?) = VN Hè (Q2). All above systems are solved by standard Newton's 
iteration and each iteration is solved by the generalized minimal residual method 
(GMRES). 

We also note that the above bilinear form can be straightforwardly extended to 
the problem both with Dirichlet boundary I'5 and Neumann boundary Iy, where 
PpNTy =0,TpUTy = dQ, and (u Vu — pl) ‘Ary =g. 

When u; and ù; are obtained, we can use the FFT to compute Q^ on each node 
of N; as follows: 


d^ =y] LG, «1-235 4 ÉG 2 L2u,-ü 
= E -*( X)t5 z * (uy | + Uy, - U,) 


Following Algorithm 1, we can now use $^ to update the indicator function xy, 
by the strategy presented in Algorithm 1. 


Remark 3.1 Similar to the adaptive in time strategy used in [9], we can modify 
Algorithm | into an adaptive algorithm by adjusting t during the iterations. We set a 
threshold value t, and a given tolerance e,, if et < er, let Trew = nt with y € (0, 1) 
and update T :— Tnew in the next iteration unless t < r;. Otherwise, t will not be 
updated, and the iteration will continue with the same r. 


4 Numerical Experiments 


In this section, we perform extensive numerical examples to demonstrate the effi- 
ciency of our new algorithm with an adaptive strategy for the choice of t. We choose 
y = 0.5 in the update of c. If no confusion is possible, we still denote by r as its 


initialization in the following. Also, we denote the Reynolds number by Re — z 


4.1 Two Dimensional Results 


In this section, we test the performance of the proposed algorithm on two dimen- 
sional problems on several different design domains as displayed in Fig. 1. For most 
examples in this section, we assume that the Dirichlet boundary condition with a 
parabolic profile and the magnitude of the velocity are set as |up| = g(1 — 4(54)*) 
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Fig. 1 Design domains of two dimensional examples 


witht € [a — ; a+ 7], where / is the length of the section of the boundary at which 
the inflow/outflow velocity is imposed, and g is the prescribed velocity at the mid- 
point a of the flow profile. The directions of the inflow/outflow velocity are illustrated 
separately in the design domain in each example. 


Example 1 In this example, we consider the design of a bend, which has been tested 
by the level set method in [10, 14, 19]. The design domain is presented in Fig. la. 
Let g be 1 both in inlet and outlet, and we set the fluid fraction as $ = 0.087r. Here, 
we use our algorithm to obtain the optimal design result on a 128 x 128 grid. We 
assume the initial distribution x = 0 in the whole domain, and set the parameter 
à = 1.5 x 10^ through this example. 


The boundary conditions in this example are slightly different with [10, 19], but 
are same as that in [14]. Based on the 128 x 128 grid, firstly, we test the example 
for different Reynolds numbers, in which the other parameters are set as c = 0.001 
and y — 0.0001. The optimal design results together with the velocity field and the 
energy decaying curve are displayed in Fig. 2 for the cases of Re = 10, 100 and 1000, 
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Fig. 2 (Example 1) Left to right: Optimal results and the corresponding energy decaying curve 
for the cases of Re = 10, 100, and 1000. The parameters are set as t = 0.001, y = 0.0001 and 
a=1.5u x 104 


Fig. 3 (Example 1) Plots of energy curves for à = 1.5 x 10^ and Re = 10. Left: For fixed 
y = 0.0001, energy curves for the cases of t = 0.02, 0.005, 0.001. Right: For fixed t = 0.001, 
energy curves for the cases of y — 0.0005, 0.0001, 0.00005 


separately. It was mentioned in [46] that the radius of curvature of the fluid domain 
is decreased as the Reynolds number is increased. This phenomenon can also be 
observed in Fig. 2, and the optimal results are consistent with those obtained by the 
level set methods in [10, 14, 19]. 

Furthermore, we numerically check the sensitivity of r and y on the energy 
decaying properties. In Fig.3, we displayed the energy decaying curves for different 
choices of t and y with fixed Re = 10. We observe that the energy converges to 
almost the same value. In addition, the final design results we obtained are also 
identical to the left one in Fig.2. 
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Fig. 4 (Example 2) Left to right: Optimal results and energy curves for B = 0.5 and B = 0.4 


Example 2 We test the example presented in Fig. 1b which has one parabolic inlet 
and four parabolic outlets. We assume g = 3, / = 0.2 and a = 0.8 on the inlet 
boundary x = 0. For the four outlets, we let (g, 1, a) = (1, 0.1, 0.8), (1, 0.1, 0.65), 
(1, 0.2, 0.7) and (1,0.2,0.25) on y 20, y=1, x = 1 and x = 1, respectively. 
This example has been tested by the phase field method in [18] with the same 
boundary conditions. Here, we use our algorithm to obtain the final optimal result 
on a 256 x 256 grid. Throughout this example, we set c = 0.001, y = 0.01, a = 
1.5 x 10^ and Re = 10. 


For the initial distribution x = 1 — Xi y:xe(0,1).ye(!. 5) We test this example for 
different fluid fractions 6. For the left graph of Fig.4 with 6 = 0.5, we obtain the 
optimal result after 40 iterations. For the the right graph of of Fig.4 with 6 = 0.4, 
the optimal result is obtained after 38 iterations. We find that the final result in Fig. 4 
has a treelike structure which is consistent with that obtained using the phase field 
method in [18]. The energy decaying curves for different fluid fractions f are also 
displayed in Fig. 4. 


Example 3 In this example, we consider the minimization of the power dissipation 
in a four terminal device. We set g = 1 for the two inflows and homogeneous Neu- 
mann boundaries on parts of the top and bottom boundaries with centers [0.5, 0] and 
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Fig. 5 (Example 3) Left to right: Optimal results and energy curves on a 128 x 128 grid and 
256 x 256 grid. The parameters are set as c = 0.001, y = 0.0001, & = 2.5u x 104 and Re = 1 


[0.5, 1] (see Fig. 1c). The fluid fraction is defined as 6 = 0.4. Here, we utilize our 
algorithm to achieve the optimal configurations on 128 x 128 and 256 x 256 grids. 


We test the case for t = 0.001, y = 0.0001, & = 2.5 x 10* and Re = 1 on 
128 x 128 and 256 x 256 grids. The initial distribution is set as x —1— 
Xtiw.y):xeO.1), ye, 3)- In Fig.5, we observe that the final optimal configuration is 
consistent with the result obtained using the level set method in [10]. And the final 
results for different grids are almost the same, which indicates that our algorithm 
is independent on grid for this example. Furthermore, the energy decaying property 
can be observed in Fig. 5. 


Example 4 In this example, we consider a three terminal device on the design 
domain as displayed in Fig. 1d. We set g = 1 on the two inflows and the homogeneous 
Neumann boundary condition on the outflow. The fluid fraction is set as B = 0.3 
and we test this example on a 128 x 128 grid for t = 0.0005, y = 0.0002 and 
à = 1.5y x 104. 


In this example, we study the relation of optimal configurations on different 
choices of Reynolds numbers. Based on the initial x — 1 — Xt(x,y):xe(0,D.ye( D)» 
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Fig. 6 (Example 4) Left to right: Optimal configurations and energy decaying curves for Re = 20 
and 500 


the final optimal design results with the velocity fields for Re = 20, and 500 are 
displayed in Fig. 6. We observe that the configuration gradually separates from each 
other as the Reynolds number increases. The energy decaying curves are also dis- 
played and the iteration converges in about 20 steps for Re — 20 and 25 steps for 
Re — 500, respectively. 


4.2 Three Dimensional Results 


In this section, we show the performance of the algorithm on several three dimen- 
sional problems for different design domains in Fig. 7. In the following examples, 
the magnitude of the velocity for the Dirichlet boundary condition on a slice is set as 


(say + — 2j 


iu» = a( : 
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—0.0$ 


(a) The design domain of Example 5. (b) The design domain of Example 6. 


Fig. 7 Design domains of three dimensional examples 


where g is the prescribed velocity at the center (a, b) of a circle in which the 
inflow/outflow velocity is imposed, / is the radius of the circle, (51, s?) are Cartesian 
coordinates on the slice. 


Example 5 In the example, we consider the multi-outlet problem in Fig. 7a. For the 
inflow, we set g = 1, | = 0.2, and (a, b) = G, 3) on x = 0 plane. For the outflow, 
we set / = 0.1, g = 1, and (a, b) = (0.8, 0.5), (0.8, 0.5), (0.8, 0.5), and (0.8, 0.5) 
on y=0, y = 1, z = 0, and z = 1 planes respectively. Throughout this example, 
we choose the initial distribution with fluid domain in a region of {(x, y, z) : x € 
(0,D,y €e (0, 1),z € G, 2)), and set $ = 0.2, a = 2.5 x 10^ and Re = 20. 


We first test the case for t = 0.005 and y = 0.0001 on 32 x 32 x 32 and 85 x 
85 x 85 grids. The optimal results in the left graphs of Fig. 8 are consistent with those 
obtained using the level set method in [10]. In addition, from the energy decaying 
curves in Fig.8, we observe that the iteration converges in about 20 steps and 30 
steps on coarse and fine grids respectively. In Fig.9, we displayed the slices on 
32 x 32 x 32 grids on z = 0.5 and y = 0.5 planes. 

Next, we compute the result for different t and y on the 32 x 32 x 32 grid. The 
energy curves for y = 0.0001 and t = 0.01, 0.005, 0.001 are displayed in the left 
graph of Fig. 10, and the energy curves for t = 0.005 and y = 0.001, 0.0005, 0.0001 
are displayed in the right graph of Fig. 10. We observe that the energy converges to 
almost the same value for different y and t. 


Example 6 Here, we consider an example with two inlets and four outlets. The 
design domain is defined in Fig.7a. For the two inflows, let g = 2, | = 0.05 and 
(a, b) = (0.5, 0.5) on x = Oand x = 1 planes respectively. For the four outflows, we 
set g = 1,/ = 0.05 and (a, b) = (0.5, 0.5) on y = 0, y = 1, z = Oand z = 1 planes 
respectively. In the example, we use our algorithm to obtain the final optimal result 
for t = 0.001, y = 0.0001, a = 2.5 x 10* and Re = 1. The initial distribution of 


fluid region is set as (x, y,z): x € (0,1), y e(0, D, ze G, z)}. 
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Fig. 8 (Example 5) Left to right: Optimal configurations on different grids (top: 32 x 32 x 32, 
bottom: 85 x 85 x 85) and energy curves. The parameters are set as tT = 0.005, y = 0.0001, a = 
2.5 x 10^ and Re = 20 


Fig. 9 (Example 5) The slices on the 85 x 85 x 85 grid for c = 0.005, y = 0.0001, à = 2.5 x 
10* and Re = 20. Left: The slice on z = 0.5 plane. Right: The slice on y = 0.5 plane 


222 H. Leng et al. 


0 5 10 15 20 25 30 35 40 45 50 9 5 10 15 2 25 3» 35 40 
Number of Iterations Number of Iterations 


Fig. 10 (Example 5) Plots of energy curves for à = 2.54 x 10% and Re = 20. Left: For fixed 
y = 0.0001, energy curves for the cases of t = 0.01, 0.005, 0.001. Right: For fixed t = 0.005, 
energy curves for the cases of y = 0.0005, 0.0001, 0.00005 
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Fig. 11 (Example 6) Left to right: Optimal configurations on the different grids (top: 64 x 64 x 64, 
bottom: 90 x 90 x 90) and energy decaying curves. The fluid fraction is $ = 0.18 


For the fluid fraction $ = 0.1, we design optimal configurations on 64 x 64 x 64 
and 90 x 90 x 90 grids. The final results for the coarse and fine grids with cor- 
responding energy decaying curves are displayed in Fig. 11. We observe that the 
interface is smoother on the fine mesh and the iteration converges in 25 and 30 steps 
for the coarse and fine grids respectively. 
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Fig. 12 (Example 6) Left to right: Optimal configurations for different 6 (top: 6 = 0.1, bottom: 
= 0.18), energy decaying curves, and slices on y = 0.5 plane 


Based on the 64 x 64 x 64 grid, we check the dependency of the results on the 
choice of B. In Fig. 12, we displayed the results, energy decaying curves, and slices on 
the y = 0.5 plane for the optimal shape obtained by 6 = 0.1 and 0.18. The iteration 
converges in about 25 steps and 20 steps for B = 0.1 and 0.18. From Fig. 12, we can 
observe that the solid domain in the center shrinks as P increases. 


5 Conclusion 


In this paper, we present an efficient threshold dynamics method for topology opti- 
mization for Navier-Stokes flow. This is an extension of our previous work [9] to the 
case of fluids in Navier-Stokes flow. We aim to minimize a total energy functional 
that consists of the potential power and the perimeter approximated by nonlocal 
energy. Different from the algorithm in [9], during the iterations of the algorithm, 
we need to solve not only the Brinkman equation but also an adjoint problem by the 
mixed finite element method. Then the indicator functions of fluid-solid regions are 
updated by a thresholding step which is based on the convolutions evaluated by the 
FFT. A simple adaptive time strategy is used to accelerate the convergence of the 
algorithm. Some numerical examples are presented to verify the efficiency of the new 
algorithm, and the total energy decaying property of the proposed algorithm can be 
observed numerically. The proposed algorithm is simple and easy to implement. For 
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the numerical experiments that we have performed, the proposed algorithm always 
finds an optimal shape and the numerical results are relatively insensitive to the initial 
guesses and parameters. 
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Dynamics of Complex Singularities A) 
of Nonlinear PDEs Chock for 


updates 


Analysis and Computation 


J. A. C. Weideman 


Abstract Solutions to nonlinear evolution equations exhibit a wide range of inter- 
esting phenomena such as shocks, solitons, recurrence, and blow-up. As an aid to 
understanding some of these features, the solutions can be viewed as analytic func- 
tions of a complex space variable. The dynamics of poles and branch point singular- 
ities in the complex plane can often be associated with the aforementioned features 
of the solution. Some of the computational and analytical results in this area are 
surveyed here. This includes a first attempt at computing the poles in the famous 
Zabusky—Kruskal experiment that lead to the discovery of the soliton. 


1 Introduction 


Ever since Kruskal [22] remarked that soliton motion may be thought of as a “parade 
of poles,” the study of complex pole dynamics in nonlinear wave equations has been 
an active research field. This paper is an overview of the field, using some of the well- 
known model problems, including the Korteweg—De Vries equation that prompted 
Kruskal’s remark. The plan is to take these equations, some of them dissipative 
and others dispersive, and start them all with the same set of initial and boundary 
conditions. Using analysis where we can and numerical computation otherwise, we 
shall then track the evolution of the complex singularities. The singularity dynamics 
of the various equations will be contrasted, and also connected to the typical nonlinear 
features associated with these equations such as shock formation, soliton motion, 
finite time blow-up, and recurrence. Here, a particular interest is the entry of the 
singularities when the initial condition has no singularities in the finite complex 
plane. 
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We consider equations of the form 
u, + uu, = L(u), t>0, -t™<x <7, (1) 


and assume 277 -periodic solutions in the space variable, x. The linear operator on 
the right can be any one of 


L(u) = V uyy (Burgers), (2) 
L(u) = —V uxxx — (Korteweg-De Vries), (3) 
L(u) = v H{uxx} (Benjamin—Ono), (4) 


where v is a nonnegative constant and H denotes the periodic Hilbert transform, 
defined below. As initial condition we consider 


u(x, 0) = — sin(x), (5) 


the particular form of which allows us to make connections to several works of his- 
torical interest, namely papers by Cole [10], Hopf [21], Platzman [27], and Zabusky 
and Kruskal [39]. 

The numerical procedure we follow is similar to the one proposed in [35]. The 
first step involves a Fourier spectral method in space and a numerical integrator in 
time to compute the solution on [—x, 7] x [0, T]. The second step is to continue the 
solution at any time ¢ in [0, T] into the complex x-plane. For the continuation we 
use a Fourier—Padé method, although other possibilities are considered as well. 

In order to identify and display poles and branch points in the complex plane, 
we shall plot what is called the “analytical landscape” in [34]. With the solution 
f (x) expressed in polar form re'?, the software of [34] can be used to generate a 
3D plot in which the horizontal axes represent the real and imaginary components 
of z = x + iy, the height represents the modulus r, and colour represents the phase 
e'?. The two examples in Fig. 1 should clarify this visualization. 

The outline of the paper is as follows: The inviscid Burgers equation and its viscous 
counterpart are discussed, respectively, in Sects. 2 and 3. Here, analysis provides the 
exact locations of the branch point singularities in the inviscid case and approximate 
locations of the poles in the case of small viscosity. For the other PDEs considered 
here, namely Benjamin-Ono (BO) in Sect. 4 and Korteweg-de Vries (KdV) in Sect. 5, 
analytical results are harder to come by and we resort to the numerical procedure 
mentioned above. The nonlinear Schrödinger equation (NLS) also makes an appear- 
ance in our discussion of recurrence in Sect. 6. In the final section we discuss the 
details of the numerical methods employed in the earlier sections. 

Novel results presented here include the pole dynamics of the BO, KdV, and NLS 
equations. Related studies of KdV were undertaken in [7, 17], but these authors 
did not consider the Zabusky—Kruskal experiment which is our focus here. Pole 
behaviour in KdV and NLS was also discussed in the papers [11, 22] and [9, 23], 
respectively, but those analyses were based on cases where explicit solutions are 
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Fig. 1 Analytical landscapes of the functions f(z) = 1/2 (top left), and f(z) = z^? (top right). 
The height represents the modulus and the colour represents the phase, as defined by the NIST 
standard colour wheel (bottom); see [13]. For details about the software used to produce these 
figures, see [34] 


available. Moreover, in those papers the poles were already present in the initial 
condition. Here, our interest is in the situation where the singularities are “born” at 
infinity. 

Although this paper focuses only on simple model equations such as (1)-(4), pole 
dynamics have been studied in more complex models, particularly in the water wave 
context. Among the many references are [3, 7, 14]. 


2 TheInviscid Burgers Equation 


The inviscid Burgers equation, u; + uu, = 0, subject to the initial condition (5), 
develops a shock at (x, t) = (0, 1), as can be verified by the method of characteristics. 
It also admits an explicit Fourier series solution [27] 


Ji (kt) 
kt 


u(x,t) 2 —2 p» c(t) sin(kx), c(t) :— , (6) 


k=1 


valid for O < t < 1. The J; are the Bessel functions of the first kind. This series is of 
limited use for numerical purposes, however, particularly for continuation into the 
complex plane. When truncated, it becomes an entire function and will not reveal 
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Fig. 2 Solution to the inviscid Burgers equation as computed by applying Newton iteration to the 
implicit solution formula (7). The four frames correspond to t = 1, 3, i and 1 (in the usual order). 
The thicker black curve is the real-valued solution on the real axis, displaying the typical steepening 
of the curve until the shock forms in the last frame. The solution in the upper half-plane is displayed 
in the format of Fig. 1. The solution in the lower half-plane is not shown because of symmetry. 
The black dot represents a branch point singularity that travels along the imaginary axis according 
to (9). By referring to the colour wheel of Fig. 1, one can see that on the imaginary axis, there is no 
jump in phase between the origin and the branch point (in some printed versions the abrupt change 
in phase may appear to be discontinuous but it is not.) From the branch point to +i00, however, 
there is a phase jump consistent with a singularity of quadratic type 


much singularity information other than perhaps the location and type of the singu- 
larity nearest to the real axis [26, 32]. 
Instead, for numerical purposes we shall use the implicit solution formula 


u=f(x—ut),  f@œ)=- sin(x). (7) 


This transcendental equation can be solved by Newton iteration for values of x in the 
complex plane. One can start at a small time increment, say t = At, use u = f(x) as 
initial guess, and iterate until convergence. Then f is incremented to 2 As, the initial 
guess is updated to the current solution, and the process is repeated. Figure 2 shows 
the corresponding solutions in the visualization format described in the introduction. 

The figure shows one member of a conjugate pair of branch point singularities, 
born at +i00, which travels down the positive imaginary axis and meet its conjugate 
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partner (not shown) at (x, t) = (0, 1) when the shock occurs. This behaviour was first 
reported in [5, 6], where a cubic polynomial was used as initial condition (similar 
to the first two terms in the Taylor expansion of (5)). In the cubic case, eq. (7) can 
be solved explicitly by Cardano’s formula, which enabled a complete description 
of the singularity dynamics as summarized in [5, 6, 28, 29]. In our case, the initial 
condition is trigonometric and therefore Cardano’s formula is not applicable. It is 
nevertheless possible to find the singularity locations and their type explicitly. 

The singularity location, say z = z,, and the corresponding solution value, say 
u = Us, are defined by the simultaneous equations 


us = f (Zs — ust), l= —tf' (Zs — ust), (8) 


the latter equation representing the vanishing Jacobian of the mapping; see for exam- 
ple [26]. With f(x) defined by (5), the solution is, for O < t < 1, 


zs = Łi (v1 — £? — tanh™! V1 — t? ), us = it !y1—0. (9) 
(V v 


These formulas are consistent with the solution shown in Fig.2. A graph of the 
singularity location as a function of time is shown as the dashed curve in Fig. 3 of 
the next section. 

Further analysis shows that the singularity is of quadratic type, consistent with 
the phase colours in Fig. 2 and in agreement with the analysis of [5, 6, 28, 29] for the 
cubic initial condition. When t = 1, i.e., at the time the shock occurs, the singularity 
type changes from quadratic to cubic. The Riemann surface structure associated with 
this is discussed in [5, 6], in connection with the cubic initial condition. 


3 The Viscous Burgers Equation 


When viscosity is added, i.e., v > 0 in the Burgers equation (1)-(2), shock formation 
does not occur. In the complex plane interpretation this means the singularities do not 
reach the real axis. Moreover, they become strings of poles rather than the branch 
points observed in the previous section. The poles travel in conjugate pairs from 
ti oo, with rapid approach towards the real axis, before turning around. They retrace 
their steps along the imaginary axes at a more leisurely pace, and eventually recede 
back to infinity, which ultimately leads to the zero steady state solution.' 
Analogously to (6), the Burgers equation subject to the initial condition (5) has an 
explicit series solution, this time not a Fourier series but a ratio of two such series: 


0, 1 2 Ea sah 
u(x,t) = 2, 0(x,t) :— h(5,) +2 CD's e ET cos(kx). (10) 


! A movie of the pole dynamics of this solution and some of the other solutions in this paper can 
be found on the author’s web page [36]. 
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Fig. 3 Left: Solution of the viscous Burgers equation (2), with v = 0.1, t = 1, as computed from 
the series solution formula (10). Right: The locations on the positive imaginary axis of the first 
four poles as a function of time. The dash-dot curve is the location of the branch-point singularity 
when v = 0, as given by formula (9) (the pole curves approach the dash-dot curve asymptotically 
as t — 0* but could not be computed reliably for small values of t because of ill-conditioning, 
hence the gaps) 


The J; are the modified Bessel functions of the first kind. This solution is derived 
from the famous Hopf-Cole transformation; in fact, the above series is a special case 
of one of the examples presented in the original paper of Cole [10]. Presumably the 
solutions (6) and (10) can be connected in the limit v > 0*, but we found no such 
reference in the literature. 

The pole locations in Fig.3 can be computed from the series solution (10). For 
asymptotic estimates, however, a better representation is the integral form [10, 21]: 


[X F exp (FO, s 0) ds 


u(x,t) = (11) 
[As exp (iro. 5, )) ds 
In the case of the initial condition (5) the function F is defined by 
EV 
F(x, s,t) = 1 — cos(s) — Ez, (12) 


To estimate the pole locations in the inviscid Burgers equation one can analyze 
the denominator of the formula in (11). Looking for poles on the positive imaginary 
axis, we define, for y > 0, 


oo 


1 
Dy. = f exp(5F6i.5.0) ds. (13) 


—00 
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A saddle point method can be used to estimate this integral when 0 < v < 1. We 
present an informal analysis here, focussed on an explanation of the situation shown 
in Fig.3. A more comprehensive analysis (for the cubic initial condition) can be 
found in [28]. 

Figure 4 shows level curves of the real and imaginary parts of F(iy, s, t) in the 
complex s-plane, with y = 1 and t = 1. The figure reveals three saddle points, two 
in the upper half-plane and one in the lower half-plane. The contour of integration 
in (13) is accordingly deformed into the upper half-plane, in order to pass through 
the two saddle points. 

To estimate the saddle point contributions, we differentiate (13) with respect to s 
(and suppress the dependence on y and f), 


/ : (s EN yi) n 1 
F (s) = sin(s) — ner. F” (s) = cos(s) — a (14) 


The saddle points are defined by F'(s) = 0, i.e., 
s— yi —tsin(s) =0. (15) 


No explicit solution of this equation seems to exist, but it can be checked that for 
t = l andall y > Othere is precisely one root on the negative imaginary axis, and two 
roots in the upper half-plane, symmetrically located with respect to the imaginary 
axis. The configuration shown in Fig.4 can therefore be taken as representative of 
all y > 0, except that the saddle points coalesce at the origin as y > O”. 

We label the roots in the first and second quadrants as s, and s2, respectively, with 
s2 = —S,. The corresponding saddle point contributions are Dı and D2, where 


D, =2 [exp (FG) 210 2) (16) 
VON III Na M gru top 


where the upper (resp., lower) sign choicereferto j — 1 (resp., j — 2). Thequantities 
6; are defined by F"(s;) = |F" (sle. 

The approximation to the denominator integral (13) is now given by D ~ Dı + Do 
as v > 07, After using the symmetry relationships between sı and sz noted above, 
as well as the fact that | F” (s1)| = | F” (s2)|, this becomes 


D~ | a sin (5 Tan) Fishy ai (17) 
~4 | ————e»"! sin | — — 501), $1) := i. 
IF" GI 2» 2^ ee 


In the second frame of Fig. 4 the graph of this function is shown as a function of y. 
In comparison with a high-accuracy quadrature approximation of the integral (13), 
the approximation (17) is seen to be quite accurate. The exception is for small values 
of y, because of the coalescence of the saddle points mentioned above. 
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Fig. 4 Saddle point analysis for the viscous Burgers equation shown in Fig.3. Left: The dots 
are saddle points of F (yi, s, t), with y = 1, t = 1. The colour represents level curves of the real 
part of F (yi, s, t), and the dash-dot curves are level curves of the imaginary part. For the saddle 
point analysis the path of integration in (13), i.e., the real line, is deformed into the dash-dot curve 
in the upper half-plane that defines the steepest descent direction. The main contributions to the 
integral come from the regions in the neighbourhood of the saddle points. Right: The function 
D(y, 1), computed by numerical integration of (13) (solid curve), in comparison with the saddle 
point approximation (17) (dash-dot curve). The zeros of this function define the locations of the 
poles seen in Fig. 3 


Table 1 Left: Pole locations on the positive imaginary axis for the solution shown in Fig. 3, i.e., 
t = 1 and v = 0.1. The ‘exact’ values were computed by numerical quadrature of (13) and root 
finding, both processes executed to high precision. The estimated values were computed by a 
numerical solution of the two equations (15) and (18). Right: Turning points of the poles, i.e., the 
coordinates of the local minima in the right frame of Fig.3. This was computed by a numerical 
solution of the two equations (15) and (18) in combination with a minimization procedure with 
objective function y 
k | Exact Estimated 
1 0.4589 0.4527 
2 0.9090 0.9068 
3 12964 1.2952 
4 1.6505 1.6498 


1.7221 0.3469 
1.1612 0.8991 
0.8302 1.2822 
0.6373 1.5684 


BUD > 


Approximate pole locations can be computed as the zeros of (17), 1.e., 
pi — v0; —2vkz, k=1,2,..., (18) 


which is solved simultaneously with the saddle point equation (15). In Table 1 we 
compare this estimate with the actual pole locations. 

The equations (15)-(18) can be used as basis for further analysis, both theoretical 
and numerical, of the pole locations. For example, by solving these equations numer- 
ically and simultaneously minimizing over y, the closest distance any particular pole 
gets to the real axis can be computed. These results are also summarized in Table 1. 
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Fig. 5 Finite time blow-up in the Burgers equation (2) with v = 0.1, subject to the complex initial 
condition (19). The poles approach the origin from the positive imaginary direction, as can be seen 
in the left frame, which corresponds to t = 0.7. In the right frame the leading pole has reached the 
real axis, roughly at ¢ = 1, which results in a blow-up (note that there is no upper/lower half-plane 
symmetry as was the case in Fig. 2, so we show both half-planes in this figure) 


In conclusion of this section on the Burgers equation we mention a lesser 
known fact, namely, that nonlinear blow-up is possible with complex initial data. 
For example, Fig. 5 shows the blow-up in the solution corresponding to the complex 
Fourier mode initial condition 


u(x, 0) = — sin(x) — i cos(x). (19) 


Features such as the blow-up time or the minimum value of v that allows blow-up 
can be analyzed by the saddle point method outlined above, but we shall not pursue 
this here. 

When dispersion replaces diffusion in (1), the poles drift away from the imaginary 
axis. The pole behaviour is more complicated than in the Burgers case and the bigger 
the dispersive effects, the more intricate the behaviour. For this reason we tackle the 
less famous BO equation first, before getting to the more celebrated KdV equation. 


4 The Benjamin-Ono Equation 


The periodic Hilbert transform H in (4) can be defined as a convolution integral 
involving a cotangent kernel [19, Ch. 14], or, equivalently, in terms of Fourier series 


oo 


u(x,t) = > ame" => Hju} = y (—i)sgn(k)k e ne". (20) 


k==00 k=-=00 


When the nonlinear term in (1) is absent, both the BO and KdV equations are 
linear dispersive wave equations. They admit travelling wave solutions u(x, t) = 
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Fig. 6 Solutions to the Benjamin-Ono equation (1) and (4), corresponding to the initial condi- 
tion (5), with v = 0.1. The pole dynamics of this solution can be seen in Fig. 7 


ei X= with dispersion relations w = —v sgn(k)k? and w = —v k?, respectively. 
The quadratic vs cubic dependence on the wave number k makes dispersive effects 
in the BO equation less pronounced than in the KdV equation. 

With the nonlinear term in (1) present, both the BO and KdV equations are com- 
pletely integrable and solvable, in principle, by the inverse scattering transform [1]. 
For arbitrary initial conditions and particularly with periodic boundary conditions, 
however, it is unlikely that all steps of the procedure can be completed successfully to 
obtain explicit solutions. Numerical methods will therefore be used to study singular- 
ity dynamics. As mentioned in the introduction, this consists of a standard method of 
lines procedure to obtain the solution on the real axis, followed by numerical analyt- 
ical continuation into the complex plane by means of a Fourier-Padé method. Details 
are postponed to Sect.7. Our choice of a Padé based method stems from the fact that 
singularities in both BO and KdV (next section) are expected to be poles. This is 
related to the complete integrability of these equations and the Painlevé property as 
discussed in [1, Sect. 2]. 

Figure 6 shows the solution on the real axis for the BO equation. Like diffusion, 
dispersion prevents shocks, but the mechanism is different: oscillations appear and 
separate into travelling wave solutions. In the case of KdV, this behaviour gave rise to 
the numerical discovery of the soliton, as discussed in Sect. 5. In the present example, 
about eight such solitons can be seen, perhaps most clearly identifiable in the pole 
parade shown in Fig. 7. 
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t= 0.4 


Fig. 7 Pole locations of a subset of the solutions of the BO equation shown in Fig. 6. Each soliton 
in that figure can be associated with a pair of conjugate simple poles in the complex plane. The 
poles that exit on the left re-enter on the right because of the periodic boundary conditions 


The initial pole behaviour is very similar to that observed in the Burgers equation, 
namely, the poles are born at infinity and start to travel in conjugate pairs towards 
the imaginary axes. Unlike the Burgers case, however, the poles do not remain on 
the imaginary axes but veer off into the left half-plane. Eight pairs can eventually be 
associated with the solitons shown in Fig. 6. 

In the absence of readily computable error estimates for our procedure we have 
used the following strategy to validate the results. Poles of the BO equation are simple, 
each with residue +2iv; see for example [8]. The order and residue of each pole can 
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be checked by contour integration on a small circle surrounding its location [35].? 
Using this technique, spurious poles and other numerical artifacts can be identified 
(one example of which is the slight irregularity near —3 + 0.81 in the third frame of 
Fig. 7.) 


5 The Korteweg-De Vries Equation 


In the case of KdV, the qualitative behaviour of the solutions is similar to that of the 
BO equation. The dispersion prevents shock formation in the solution by breaking 
it up into a number of solitons, which is the famous discovery of Zabusky and 
Kruskal [39]. The iconic figure from that paper is reprinted in Fig. 8. In the left frame 
of Fig.9 we reproduce that solution, but rescaled to the domain [—z, 7 ] in order to 
facilitate comparisons with the other solutions shown in this paper. 

The initial behaviour is the same as for the other equations we have seen thus 
far, namely, there are poles that enter from infinity and travel towards the real axis 
in conjugate pairs, roughly similar to the first two frames in Fig.7. As was the case 
for the BO equation, dispersion causes the poles to drift into the left half-plane and 
eventually re-enter in the right half-plane because of periodicity. The eight solitons 
marked in the Zabusky—Kruskal figure are clearly identifiable in the pole plot of 
Fig.9, with the poles closer to the real axis corresponding to the taller solitons. 

We have used the same strategy mentioned at the end of Sect. 4 for validation of 
Fig. 9. In the case of KdV the poles are locally of the form —12v/(z — zo)”. The phase 
information of Fig.9, when viewed in colour, makes it clear that the computed poles 
are indeed of order two, and contour integration confirmed the strength coefficient 
of —12v. 

It should be noted, however, that numerical analytical continuation is inherently 
ill-conditioned as one goes further into the complex plane, and that puts some limi- 
tations on our investigations. Two examples are as follows: 

First, fort << 1 we found that the Fourier-Padé based method was not able to pro- 
duce the theoretical pole information accurately, presumably because of the distance 
between the real axis and the nearest singularity. Therefore no figures of this initial 
phase of the evolution are presented here. Second, in the literature the existence of 
‘hidden solitons’ in the Zabusky—Kruskal experiment is mentioned; see [12] (and 
the references therein). In order to investigate these hidden solitons, the solution of 
Fig. 9 has to be continued much farther into the complex plane. Because of spurious 
poles and the ill-conditioning alluded to above, our efforts at tracking these hidden 
solitons were inconclusive. Both of these investigations are offered as a challenge to 
computational mathematicians. 

Here are two suggestions for such investigations. First, for the KdV method it is 
recommended that the equation be transformed into the potential KdV equation, by 


? The order of a pole can also be confirmed visually by examining the phase information in the pole 
plots. 
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NORMALIZED DISTANCE 
Fig.8 The iconic figure of soliton formation in the KdV equation. The initial condition is u(x, 0) = 
cos(x) on [0, 2], with v = 0.0222. Reprinted, with permission, from [39]. Copyright (1965) by 
the American Physical Society 


t = 3.60 


Fig. 9 Left: the Zabusky—Kruskal solution shown in Fig. 8, after rescaling to [—z, zr]. Right: the 
corresponding poles in the complex plane 


the substitution u = vx; see [22]. This equation has simple poles, which makes it 
better suited for approximation by Padé methods. Second, the use of multi-precision 
arithmetic is advisable. Here, everything was done in IEEE double precision, mainly 
because of the speed if offers to create animations of the pole parades [36]. 


6 Recurrence 


Historically, the discovery of the soliton in [39] overshadowed the fact that the 
objective of that paper was something else entirely, namely, the verification of the 
recurrence phenomenon previously discovered by Fermi, Pasta, Ulam, and Tsingou 
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(FPUT) in yet another celebrated numerical experiment [16].* In short, this means 
that if a nonlinear system is started in a low mode configuration such as the initial 
condition (5), then higher modes are created by the nonlinear interaction, causing an 
energy cascade from low modes to high. The upshot of the FPUT experiment was 
that this process is not continued indefinitely, but eventually reverses with most of the 
energy flowing back to the low modes. The effect of this is that the initial condition 
is reconstructed briefly—approximately so and with a shift in phase—after a certain 
period of time. 

Numerical experiments with KdV such as those reported in Sect.5 do not reveal 
the recurrence behaviour in the pole dynamics. Had true recurrence occurred, the 
poles would have retraced their steps back along the imaginary axes out to infinity 
or would have cancelled somehow. The most we could observe at the purported 
recurrence time was a slight widening of the strip of analyticity around the real axis. 
This lack of a clear recurrence can be attributed to the fact that the phenomenon is 
rather weak in KdV, as discussed in detail in [20]. 

For a more convincing demonstration of recurrence one has to look outside the 
family (1)-(4). Perhaps the best PDE for this purpose is the NLS equation 


iu, + Uy, + vlul?u = 0, (21) 


where the solution, u(x, t), is complex-valued. We shall consider v > 0 (known as 
the focussing case) and continue to work with 27 -periodic boundary conditions. It 
will be necessary, however, to modify our initial condition to have nonzero mean, so 
we consider 

u(x,0) =1+€cosx. (22) 


The corresponding solution is an e-perturbation of the x-independent solution 
u = e", Linearisation about this solution shows that the side-bands e+'"* grow 


exponentially for all integers n satisfying [37, 38] 


0 « n? « 2v. (23) 


That is, for v < 5 there is no instability, for 5 «v < 2 a single pair of side-bands 
is unstable, a double pair for 2 < v < 2. and so on. The instability is named after 
Benjamin and Feir, who derived it not via the NLS but directly from the water wave 
setting [4]. The growth does not continue unboundedly but subsides, and recurrences 
occur at periodic time intervals. The connection between Benjamin-Feir instability 
and FPUT recurrence was pointed out in [38]. 

The growth and recurrence pattern for a special case with two unstable modes 
can be seen in Fig. 10. In frames 2, 3 and 7, 8 the unstable mode e*'* dominates, 
while e*?/* dominates in frames 4, 5, and 6. An almost perfect recurrence occurs in 
frame 9, after which time the process continues periodically. 


3 Since the mid-2000s it has been recognized that Mary Tsingou deserves credit for her computa- 
tions, and so the FPU experiment was renamed FPUT. 
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Fig. 10 Solutions to the nonlinear Schrödinger equation (21) corresponding to the initial condi- 
tion (22), with v = 3, e = 0.1. The unstable modes e*!* and e*?/* take turns in dominating the 
solution, with a near perfect recurrence att = 5. The pole dynamics of the first phase of this solution 
can be seen in Fig. 11 


Pole locations of some of the solutions in Fig. 10 can be seen in Fig. 11. The first 
unstable mode is controlled by a conjugate pair of simple poles on the imaginary axis. 
The second is controlled by two pairs of conjugate poles, each pair symmetrically 
located with respect to the imaginary axis. The first frame shows the initial onset, 
with the poles on the imaginary axis leading the procession. The second frame is 
roughly where the first mode reaches its maximum growth, which corresponds to the 
point at which the poles reach their minimum distance to the real axis. In the third 
frame, these poles are receding back along the imaginary axes and are overtaken by 
the approaching secondary sets of poles. The last frame shows a situation where the 
second mode has become dominant. At the recurrence time, all of these poles will 
have receded back to infinity. 


7 Numerical Tools 


In this final section we review some of the numerical techniques that can be used 
in this field. Our discussion, which focuses primarily on Padé approximation and 
its variants, is by no means exhaustive. For other approaches, including tracking the 
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t = 0.63 t= 1.25 


3 


Fig. 11 Pole locations of a subset of the solutions of the NLS equation shown in Fig. 10. In the 
first two frames the unstable mode e*/* dominates, while e*?/* dominates in the last two frames. 
This is determined by which pairs of poles are closest to the real axis 


poles through the numerical solution of certain dynamical systems, we refer to [7, 
26, 32, 33]. 
We limit the discussion to 27r -periodic solutions that admit a Fourier series expan- 


sion of the form 
oo 


u(x,t) = y (ne,  —m xxm. (24) 


k=—00 


In some rare cases the coefficients c; (f) are known explicitly; cf. (6). Otherwise, the 
cy (t) can be computed numerically by a Fourier spectral method and the method of 
lines [35]. In order to do this step as accurately as possible, it is necessary to truncate 
the Fourier series to a large number of terms (here we used |k| « 256 or 512), and 
also use small error tolerances in the time-integration (here on the order of 107? in 
the stiff integrator ode15s in MATLAB). 

When truncated, the series (24) becomes an entire function and will not reveal 
much singularity information other than perhaps the width of the strip of analyticity 
around the real axis [32]. A more suitable representation is obtained by converting 
the truncated series to Fourier-Padé form. For a fixed value of t (suppressed for now 
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in the notation) we convert the series to Taylor-plus-Laurent form by the substitution 
z= e": 


uem X ge = Y ezt t y asilo. (25) 
k-—-—oo k=0 k=0 


(It is necessary to redefine co — co/2.) Each term on the right can be converted to a 
type (N, N) rational form as follows. Consider the first term and define 


oo N N 
Jue a qe» ae oam n Q6) 
k=0 k=0 k=0 
One then requires that 
Ff) m => p()-aq()fG)- 0H). (27) 


The latter equation can be set up as a linear system to solve for the coefficients az 
and bz (after fixing one coefficient, typically by = 1). The second term on the right 
in (25) can be converted to rational form in the same way, which then gives the 
approximation to u(x) as the ratio of two Fourier-series. The pole plots in Sects. 4, 
5 and 6, were all computed using this Fourier-Padé approach. 

A promising alternative to the Padé approach to rational approximation is the 
so-called AAA method, recently proposed in [24], with subsequent extensions to 
the periodic case [25]. It is not implemented in coefficient space like (24)-(26), 
but rather uses function values, easily obtained from (26) by an inverse discrete 
Fourier transform. The representation is the barycentric formula for trigonometric 
functions [18] 
bom ED esc (5G — XK)) Uk 


NA (ED ese (E(x —2)) ' 


(28) 


u(x) = 


applicable when M is odd (a similar formula holds for even M). When x, = —z + 
(k — 1)27/M (.e., evenly spaced nodes in [—z, x)) and ug = u(x), then u(x) is 
identical to the series (26) when truncated to |k| « N, where 2N 4- 1 — M. 

Inthe AAA algorithm the so-called support points x; are not chosen to be equidis- 
tant, which changes the formula (28) from a truncated Fourier series to a rational 
form. The choice of the x; proceeds adaptively so as to avoid exponential instabilities. 

In preliminary numerical tests the trigonometric AAA algorithm was competitive 
with the Fourier-Padé method described above. But further experimentation is needed 
to decide the winner in this particular application field. 

Neither of these two methods, however, can give much information on branch 
point singularities. One way of introducing branches into the approximant is quadratic 
Padé approximation [30], which is a special case of Hermite-Padé approximation [2]. 
Define a polynomial r(x) similar to p(x) and q(x) in (26), and in analogy with the 
rightmost expression in (27) define 
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PO tIOLO rO OF = O+?) (29) 


Dropping the order term on the right yields 


y 74 E V a(2? — 4P@r@ 


q) 2r(z) 


, (30) 


and when this is used to approximate the two terms on the right of (25) a two-valued 
approximant to u(x) is obtained. Cubic and higher order approximants can be defined 
analogously, but will not be considered here. 

Recall that Fig. 2 showed a solution of the inviscid Burgers equation with a branch 
point singularity. To test how accurately this singularity can be approximated by 
these methods, we solved the equation numerically as described below eq. (24). (We 
refrained from using the explicit series (6), which is too special.) The numerical 
solution (24) was then continued into the complex plane using the Fourier-Padé and 
quadratic Fourier-Padé approximations. Although we have a large number of Fourier 
coefficients available, we found that best results are obtained if only a fraction of 
those are used in the Padé approximations. For the results shown here, we used only 
N = 35 terms in the series for f(z) in (26), which translates into a type (17, 17) 
linear Fourier-Padé approximant, and type (11, 11, 11) in the quadratic case. 

The results are shown in Fig.12. The middle figure is the reference solution, 
computed to high accuracy by the Newton iteration described in Sect. 2. On the left 
is the approximation obtained by the linear Fourier-Padé approximant. Away from 
the imaginary axis the approximation is good, but it is poor on the axis itself. In the 
absence of branches in the approximant, a series of poles and zeros (the latter not 
clearly visible) appears as a proxy for the jump in phase. The fact that alternating poles 
and zeros ‘fall in the shadow’ of the branch point is a well-known phenomenon in 
standard Padé approximation [31], and is evidently also present in the trigonometric 
case.* On the other hand, the quadratic Fourier-Padé approximant shown on the right 
is virtually indistinguishable from the reference solution. 

The relative errors in these two approximations are shown in Fig. 13. The linear 
approximant has low accuracy near the imaginary axis because of the spurious poles 
mentioned above. By contrast, the quadratic approximant maintains high accuracy, 
even on the imaginary axis. If one takes the solution generated by the Newton method 
as exact, the quadratic approximant yields more than five decimal digits of accuracy 
in almost the whole domain shown in Fig. 13. 

Further discussion of numerical aspects of quadratic Padé approximation, includ- 
ing their computation and conditioning, can be found in [15]. 


^ Comparing the left frames of Figs.3 and 12 is interesting. Both solutions can be viewed as a 
perturbation of the multivalued solution shown in Fig.2. In Fig.3 the perturbation is caused by a 
small amount of diffusion, while in Fig. 12 it is caused by numerical approximation. In both cases 
the proximity of the multivalued solution is revealed by a sequence of zeros and poles along the 
phase discontinuity. 


Dynamics of Complex Singularities of Nonlinear PDEs 245 


Fig. 12 Approximation of a branch point singularity in the inviscid Burgers equation, at t = 0.75. 
Left: a type (17, 17) linear Padé approximation. Middle: reference solution computed by Newton 
iteration from (7). Right: a type (11, 11, 11) quadratic Padé approximation 


Fig. 13 Relative errors in the approximation of the branch point singularity of Fig. 12. Left: the 
linear Padé approximation. Right: the quadratic Padé approximation. Bottom: the colour bar in a 
logi;o scale, so each change in shade represents roughly one decimal digit of accuracy 
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